A Massive 754B Parameter Model Has Emerged. But What Small and Medium Enterprises Really Need is ‘Small and Specialized’

The Race for Massive Models Does Not Concern Small and Medium Enterprises Z.AI has announced its 754B parameter model,

By Kai

|

Related Articles

The Race for Massive Models Does Not Concern Small and Medium Enterprises

Z.AI has announced its 754B parameter model, “GLM-5.1,” capable of autonomous operation for eight hours. Meta has introduced Muse Spark, while Anthropic is developing the private Claude Mythos.

The AI industry has entered a competition where “the one who builds the biggest model wins.” The number of parameters continues to grow, along with the required number of GPUs, power consumption, and costs.

However, we should pause and consider: Do small and medium enterprises need a 754B parameter model?

The answer is almost certainly “No.”

Facing the Costs of Massive Models

Z.AI’s GLM-5.1 alone has a model size of 1.51TB. To run this, a dedicated server equipped with multiple NVIDIA H100 GPUs is necessary. The hardware alone costs several million yen. Even if run in the cloud, the inference cost ranges from a few yen to several tens of yen per request. Processing tens of thousands of requests monthly can quickly lead to monthly costs in the hundreds of thousands of yen.

Meta’s Muse Spark is designed to be integrated into Instagram and WhatsApp, intended for use within Meta’s own ecosystem. It is not something that small and medium enterprises can use independently.

As for Anthropic’s Claude Mythos, it has been explicitly stated that it is “too dangerous for public release,” available only to select companies like Amazon, Apple, and Microsoft.

In other words, the benefits of the massive model race are directly accessible only to large corporations and platform providers. Small and medium enterprises do not need to get caught up in this competition.

The Astonishing Numbers from ‘Small Models’

So, what is the real answer for small and medium enterprises? The answer is “small, specialized models.”

Here are some specific numbers.

Case 1: Equipment Anomaly Prediction

An open-source method using a 1,116-dimensional triplet feature fusion pipeline for equipment anomaly prediction has been published. It extracts statistical features from 90 days of sensor history and combines them with a small-scale model adapted using LoRA.

The results are as follows:

  • 30-day anomaly prediction: F1 Score 0.958
  • ROC-AUC: 0.998

This is an accuracy level that is “almost infallible.” Moreover, it is achieved with a lightweight model, not a massive one. The required infrastructure is just a standard server, and the cost is around several tens of thousands of yen per month.

Case 2: Lightweight RAG Model

A RAG (Retrieval-Augmented Generation) model using a TF-IDF-based routing method has achieved a 28.1% reduction in token costs while maintaining 93.2% accuracy.

What this means is that “it costs 30% less to answer the same question, with accuracy remaining almost unchanged.” For small and medium enterprises building an internal knowledge base, this difference can translate to a monthly cost difference of several tens of thousands of yen, amounting to hundreds of thousands of yen annually.

Why ‘Small and Specialized’ is the Right Answer for Small and Medium Enterprises

There are three reasons.

1. Costs are drastically lower

Using a massive model via API can cost several hundred thousand yen per month. Running a small, specialized model on your own server or a small cloud instance can cost only several thousand to tens of thousands of yen per month. Even though both are forms of “AI implementation,” the cost structures are entirely different.

2. Accuracy can be optimized for ‘your business’

Massive models are general-purpose models that can do “a bit of everything.” In contrast, small models can be fine-tuned using your own data. In manufacturing, a model trained on your own equipment data, or in retail, one trained on your own sales data, often achieves higher accuracy than a general-purpose model.

If you only need to read your own invoice format, you don’t need 754B parameters. A 7B parameter model fine-tuned on your own data would be sufficient, costing less than one-hundredth of the price.

3. You can control it yourself

With massive models, the API provider can change prices, update models, and in the worst case, discontinue services. By operating a small model in-house, you free yourself from these risks. You also don’t have to send your data externally.

Make Use of the ‘Leftovers’ from the Massive Model Race

That said, it would be a mistake to completely ignore the massive model race.

As competition among massive models intensifies, API usage fees are likely to decrease. In fact, OpenAI’s GPT-4o has significantly reduced token costs compared to GPT-4. Anthropic may also reconsider pricing under competitive pressure.

Additionally, the technique of “distillation,” which uses training data generated from massive models to train small models, is becoming more common. This means transferring knowledge from massive models to smaller ones, further improving the accuracy of small models.

In other words, the correct approach for small and medium enterprises is not to “directly use” massive models, but to “indirectly benefit” from them.

So, What Should You Do?

First, choose one of your company’s challenges. It could be predicting equipment failures, automating customer inquiries, or automatically classifying documents—anything is fine.

Next, look for a small model specialized for that challenge. HuggingFace has tens of thousands of publicly available models. There’s a high chance you’ll find a model that closely matches your industry or application.

Then, fine-tune it with your own data. Using LoRA, fine-tuning can be completed in just a few hours with a standard GPU, costing only a few thousand yen.

There’s no need to be swayed by the news of 754B parameters. The weapon for small and medium enterprises is to be “small, fast, and specialized for their own needs.” As massive models grow larger, the relative value of small, specialized models increases.

Companies that recognize this inverted structure will start to win.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN