The Cost of AI Inference Has Crashed by 90% in a Year. In an Era Where ‘Enterprise-Level AI’ Can Be Acquired for Just 50,000 Yen a Month, What Should SMEs Do?

Conclusion First: AI Is No Longer Just a Tool for the Privileged DeepSeek V4, GPT-5.5, and the practical application of

By Kai

|

Related Articles

Conclusion First: AI Is No Longer Just a Tool for the Privileged

DeepSeek V4, GPT-5.5, and the practical application of small models in the 8B class. To summarize what is happening in the AI industry as of spring 2025 in one phrase:

“The bottom has fallen out of inference costs.”

This is not just a technical matter; it is a seismic shift in business. AI processing that used to cost 300,000 yen per month last year now runs at just 30,000 yen. What used to take large companies tens of millions of yen to build in analytical infrastructure can now be done with a single API contract.

I want to ask: Does your company still think that “AI is for large enterprises only”?

What Happened: The Simultaneous Release of DeepSeek V4 and GPT-5.5

In April 2025, two major announcements coincided.

China’s DeepSeek announced its V4, a flagship model released as open-source. The inference performance has significantly improved from the previous generation V3, achieving overwhelming benchmarks particularly in coding and structured data processing. Being open-source, anyone can download it and run it on their own servers. When used via API, the cost per input token is about $0.40 per million tokens, and about $1.60 for output.

In the same week, OpenAI released GPT-5.5. This model is a successor to o3 and GPT-4o, with dramatically improved quality in complex inference, data analysis, and long text generation. Notably, despite the enhanced performance, GPT-5.5 is priced at the same level or lower than GPT-4o, which had an input token cost of $2.50 per million tokens.

In other words, performance has increased while prices have decreased. Both at the same time.

This is the essence of the “model war.” As DeepSeek and OpenAI compete on price and performance, it is the users—especially small and medium-sized enterprises (SMEs) that previously could not afford such technology—who benefit.

A Numerical Look at the Collapse of Inference Costs

How much have costs changed specifically? Let’s summarize the trends over the past year and a half.

Period Representative Model Input Cost per Million Tokens Output Cost per Million Tokens
End of 2023 GPT-4 Turbo About $10.00 About $30.00
Throughout 2024 GPT-4o About $2.50 About $10.00
April 2025 DeepSeek V4 (API) About $0.40 About $1.60
April 2025 GPT-5.5 About $2.00 About $8.00

The transition from GPT-4 Turbo to DeepSeek V4 API has resulted in a 96% reduction in input costs and a 95% reduction in output costs.

Let’s translate this into practical terms.

For example, consider a local manufacturing company that automatically classifies and summarizes 200 daily order emails using AI to generate daily reports. Assuming an average of 1,000 tokens for input and 500 tokens for output per email, and calculating over 22 working days in a month:

  • Monthly Input Tokens: 200 emails × 1,000 tokens × 22 days = 4.4 million tokens
  • Monthly Output Tokens: 200 emails × 500 tokens × 22 days = 2.2 million tokens

Cost during the GPT-4 Turbo era:
Input 4.4 × $10 + Output 2.2 × $30 = About $110 (around 17,000 yen)

Cost with DeepSeek V4 API:
Input 4.4 × $0.40 + Output 2.2 × $1.60 = About $5.3 (around 800 yen)

The cost has dropped from 17,000 yen to 800 yen per month. This is a 95% cost reduction.

For this level of processing, it is no longer a question of “whether to implement it.” It’s just the cost of a few cans of coffee.

Another Revolution: Small Models

Alongside the price collapse of large models, another change is underway: small models in the 8B class have reached practical utility.

Models like Llama 3.1 8B, Gemma 2 9B, and Phi-3 Mini, with around 8 billion parameters, are being released one after another. What is remarkable about these models?

They can run on a single PC.

If you have a PC equipped with an NVIDIA RTX 4060 (priced around 40,000 yen), you can run inference locally. There are zero API costs. Since there is no need to send data externally, they can be used for tasks involving personal information.

For example, a local labor consulting office has started using an 8B model locally to draft employment regulations. What used to take a veteran consultant three hours to create a first draft can now be completed in 30 minutes with AI generating the initial draft. The API costs are zero. The only investment required was an additional GPU for 50,000 yen to the existing PC.

A consulting fee of 3 million yen has been replaced by a 50,000 yen GPU—this is already happening in reality.

What Happens After “It Became Cheaper”

Now we get to the main point. The reduction in costs is merely a means. What is crucial is the structural discussion about what happens after costs go down.

1. The Barrier to “Testing” Disappears

At 800 yen per month, no approval is needed. Business owners can start with a simple, “Let’s give it a try.” There is no need to spend six months running a PoC (proof of concept) like large enterprises do. The speed of decision-making in SMEs directly translates to the speed of AI implementation. This is a structural advantage that large companies cannot replicate.

2. The “Personalization” Breaks Down

Know-how that was only in the heads of veteran employees—key points for estimates, patterns for handling complaints, specific considerations for each client. By feeding these to AI and prompting it, even newcomers can achieve 70% accuracy. Operations will not stop if someone leaves. This becomes the greatest insurance against the serious “resignation risk” that local SMEs face, more pressing than recruitment difficulties.

3. The Meaning of “Outsourcing” Changes

Updating websites, creating social media posts, simple data aggregation, and creating meeting minutes. Many tasks that used to be outsourced for 100,000 to 300,000 yen per month can now be internalized using AI. It’s not just about saving on outsourcing costs. The time spent on ordering, confirming, and revising communications disappears. This is actually the biggest benefit.

So, What Should We Do?

We don’t need abstract discussions. Here are three concrete actions that SMEs should take starting tomorrow.

1. First, Try Replacing One Task with AI

There’s no need to think about company-wide implementation. Choose one repetitive task you do every day and throw it at ChatGPT or DeepSeek’s API. Email classification, daily report summarization, automatic FAQ responses—anything works. You can experiment for under 1,000 yen a month.

2. Try Local Operation of Small Models

If you have tasks that handle sensitive data in-house, use tools like Ollama to run an 8B model locally. If you lack GPUs, add one for 50,000 yen. An environment where AI can be used without sending data externally directly reduces security costs for SMEs.

3. Think About “What Personalization to Break” Rather Than “What to Make AI Do”

It’s meaningless to just look at a list of AI tool functions. List out tasks in your company that “cannot be done without that person.” Apply AI to those. The goal is not “to implement AI” but to “increase the reproducibility of operations.”

The Winner of the Model War Is Not the AI Creators

The battle between DeepSeek and OpenAI will intensify further. Google’s Gemini, Anthropic’s Claude, and Meta’s Llama will join the fray, and the price competition will not stop.

However, the winner of this war will not be the companies that create AI. It will be the companies that “fully utilize” AI.

And what is needed to fully utilize AI is not massive investments, AI specialists, or the latest GPU clusters. It is the insight from the field that recognizes, “This could be useful for our operations.”

This insight is overwhelmingly possessed by local SMEs that sweat it out in the field every day, rather than large corporations in Tokyo.

The collapse of inference costs is not the democratization of technology. It is a reversal of field power.

The weapons are ready. The only question left is whether to use them.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN