Open Source AI Surpasses GPT-5.5 — Will the Arrival of GLM-5.2 Make ‘Zero Monthly API Fees’ a Reality? What Small and Medium Enterprises Should Consider Now
Related Articles
Conclusion First: “Free AI” Has Surpassed GPT-5.5
The open-source LLM “GLM-5.2” announced by the Chinese AI lab “Z.ai” has outperformed OpenAI’s latest model, GPT-5.5, in benchmarks.
If that were all, it would just be another story of “the Chinese are doing well again.” But this time, the structure is different. GLM-5.2 is open-source. This means that anyone can download and use it. The API cost is zero. There are no monthly fees. Commercial use is also possible depending on the licensing terms.
For small and medium enterprises that are currently paying 30,000 yen a month to OpenAI, this is not just a “savings story.” It fundamentally disrupts the entire expenditure structure.
What Makes GLM-5.2 Impressive — A Look at the Numbers
Let’s summarize the basic specs of GLM-5.2.
- Parameter Count: 753B (753 billion)
- Architecture: Mixture of Experts (MoE), with approximately 40B active parameters
- Use Case: Specialized in text generation
- License: Open Source (Apache 2.0)
MoE is a system that activates only the necessary “experts” based on the input, rather than using all parameters at all times. Even with a massive model of 753B, only about 40B are actively used, which helps keep inference costs down.
Notably, in coding task benchmarks, GLM-5.2 reportedly achieves scores equal to or greater than GPT-5.5 while operating at one-sixth the cost in API terms. If we assume the input token price for GPT-5.5 is $15 per 1M tokens, processing equivalent to GLM-5.2 would cost around $2.5.
Of course, benchmarks are not infallible. The “usability” and “accuracy in Japanese” in practical applications are separate issues. However, the fact that open-source has matched top-tier commercial models is significant in itself.
Is “Zero API Fees” Real? — The Hidden Costs Small and Medium Enterprises Might Overlook
Now we get to the main point. The statement “it’s free because it’s open-source” is half true and half false.
To run GLM-5.2 in-house, the following costs are incurred.
1. GPU (Hardware) Costs
To run a 753B MoE model, at least several GPUs with 80GB VRAM are required. For NVIDIA A100 (80GB), you would need about 2 to 4 units, and for H100, around 2 units.
- Purchasing 4 A100s: Approximately 6 to 8 million yen
- Renting cloud GPUs (e.g., AWS p4d.24xlarge): Monthly cost of 500,000 to 800,000 yen
Paying 500,000 yen a month for GPU costs to eliminate a 30,000 yen API fee is counterproductive. This is the biggest pitfall.
2. Exploring Practical Solutions with Quantization
However, using a technique called quantization can change the situation. This method sacrifices some model accuracy to significantly reduce the required VRAM. With 4-bit quantization, it may be possible to run on about 2 RTX 4090s (VRAM 24GB each).
- Purchasing 2 RTX 4090s: Approximately 600,000 to 800,000 yen
- Electricity costs: Monthly about 5,000 to 8,000 yen (if running 24/7)
With this setup, the initial investment would be 700,000 yen plus a monthly electricity cost of 8,000 yen. If the API fee is 30,000 yen per month, it would take about 2.5 years to break even. However, the extent of accuracy degradation due to quantization needs to be verified in your own use case.
3. Labor Costs and Verification Efforts
Another often-overlooked factor is the labor costs for setup and operation.
Deploying an open-source model involves tasks such as environment setup, inference server configuration, prompt tuning, and output quality verification. Outsourcing to an AI engineer could cost 300,000 to 500,000 yen per project, and even if done in-house, there are learning costs for the responsible personnel.
This is the biggest bottleneck for small and medium enterprises. Technically feasible, but “there are no people who can do it.”
Organizing Realistic Options
| Option | Estimated Monthly Cost | Initial Investment | Required Skills |
|---|---|---|---|
| GPT-5.5 API | 30,000 yen or more | Zero | Low |
| GLM-5.2 (Cloud GPU) | 500,000 to 800,000 yen | Zero | Medium to High |
| GLM-5.2 (In-house GPU with Quantization) | 8,000 yen | 700,000 yen | High |
| GLM-5.2 (Via API Service) | 5,000 yen or more | Zero | Low |
The last row is particularly noteworthy. Open-source models like GLM-5.2 are increasingly being offered at low costs through third-party API services. Services like Together AI, Fireworks AI, and Groq allow you to use open-source models at one-third to one-sixth the cost of GPT-5.5. Even without owning GPUs, you can still benefit from cost reductions.
ChatGPT Market Share Falls Below 50% — What’s Happening?
Another crucial piece of data is that ChatGPT’s market share has fallen below 50%.
A year ago, ChatGPT held an overwhelming share of the generative AI chat market. Now, it has lost the majority to the rise of Claude, Gemini, and open-source competitors.
This is not a story of “OpenAI becoming weaker.” It signifies that the performance gap in AI is narrowing, and we are entering an era where “it doesn’t make much difference which AI you use.”
When performance differences diminish, what will separate the contenders? Cost and optimization for one’s own business.
This presents an opportunity for small and medium enterprises.
The “Reversal Structure” for Small and Medium Enterprises
Large companies enter into annual contracts with OpenAI or Google, build dedicated environments, and invest hundreds of millions of yen to implement AI. Small and medium enterprises do not need to mimic this.
Now that the performance of open-source AI has caught up with commercial models, small and medium enterprises have a “way of fighting that only they can do because they are small.”
1. Narrowing Use Cases Dramatically Reduces Costs
Large companies seek “AI that can do everything.” Small and medium enterprises are different. “Automatic generation of estimates,” “drafting inquiry emails,” “summarizing meeting minutes” — when the use cases are clear, smaller models suffice. The 40B active parameters of GLM-5.2 may even be excessive. A specialized model with 7B to 14B parameters could run on a single RTX 4060 (around 50,000 yen).
2. No Need to Expose Data Externally
Using OpenAI’s API means that your company data passes through OpenAI’s servers. Customer information, partner data, internal know-how. Not every company can say, “We don’t mind.” Running an open-source model in-house means that data never leaves your premises. This is a significant reassurance for small and medium enterprise owners, even more than cost savings.
3. Question the Conventional Wisdom of “Paying Monthly for AI”
The model of continuously paying monthly for SaaS is a fantastic business for providers. But what about for users? 30,000 yen per month for 12 months equals 360,000 yen annually. Over five years, that’s 1.8 million yen. During that time, open-source models will undergo generational changes and continue to improve in performance.
The turning point from “continuing to pay” to “owning it yourself” is arriving right now.
So, What Should You Do?
Here are three things you can start doing today.
1. First, take stock of your company’s AI usage fees. ChatGPT Plus (monthly $20 per person), API fees, and other AI tools. Do you have an accurate grasp of how much you are paying each month?
2. Try GLM-5.2 via API. Before purchasing GPUs for your company, try throwing your current prompts at services like Together AI or Fireworks AI. If it performs comparably to GPT-5.5, switching could reduce costs to one-third to one-sixth.
3. There’s no rush to “run it in-house.” Quantization and local deployment can be confirmed for effectiveness after step 2. The technical hurdles are certainly lowering. In six months, it will become even easier.
Open-source AI has surpassed top commercial models. This is not a temporary phenomenon but a structural change. The rationality of continuously paying monthly for APIs is diminishing at this very moment.
I want to ask: Will your company continue to pay OpenAI 30,000 yen next month?
JA
EN