DeepSeek V4 Free, GPT-5.5 at $10 a Month, Inference Costs Halved with Graviton—The Position Small and Medium Enterprises Should Take in the AI Infrastructure “Three Kingdoms”

Conclusion Let’s get straight to the point: the "usage fees" for AI are approaching nearly zero. In the past week, thre

By Kai

|

Related Articles

Conclusion

Let’s get straight to the point: the “usage fees” for AI are approaching nearly zero.

In the past week, three significant news items have emerged:

  • DeepSeek V4: The latest series, including the 1.6 trillion parameter Pro model, has been released for free.
  • Meta × AWS: A partnership has been announced to optimize Llama-based agent AI on Graviton chips.
  • GPT-5.5: General availability has begun for $10 per month with GitHub Copilot.

Do you understand what’s happening here?

The price of the “AI models themselves” is collapsing.

Just two years ago, using a GPT-4 class model via API for business purposes would easily cost hundreds of thousands of yen per month. Many small and medium enterprises (SMEs) gave up after receiving estimates of 3 million yen per year just to run a single internal chatbot.

Now, performance equal to or better than that is available for free to $10 a month. In terms of cost, this represents a drop of over 97%. This is not just a matter of “becoming cheaper”; the rules of the game have changed.

The question is, “So, what should we do?” Let’s break it down step by step.

Free Release of DeepSeek V4—The True Meaning of the “Zero Model Cost” Era

Let’s take a look at the contents of the DeepSeek V4 series.

Model Parameter Count Context Length API Usage Fee
DeepSeek-V4-Pro 1.6T (1.6 trillion) 1 million tokens Free
DeepSeek-V4-Flash 284B (284 billion) 1 million tokens Free

The pre-training data consists of 33 trillion tokens. Models that achieve benchmark scores equal to or exceeding those of GPT-4o and Claude 3.5 are available for free.

You might be thinking, “Isn’t free suspicious?” But this is DeepSeek’s strategy. By distributing the models for free, they aim to secure an ecosystem of users and data. This is similar to how OpenAI captured the market by releasing ChatGPT for free.

What’s crucial for SMEs is the fact that “model performance is no longer a differentiating factor”.

Two years ago, simply saying “we use GPT-4” would set you apart from competitors. Now, models with equivalent performance are available for free. This means that the value lies not in “owning” a model, but in “how you use it” and “how you integrate it into your business”.

To be specific, a local manufacturing company could use DeepSeek V4 to analyze patterns of defects by inputting five years’ worth of defective product data (tens of thousands of records) into a 1 million token context. Previously, this would have required hiring a specialized data scientist for 800,000 yen a month and taking three months to complete, totaling 2.4 million yen. Now, with zero API costs and 2-3 days for prompt design, the total cost could be just a few tens of thousands of yen in labor costs.

From 2.4 million yen to a few tens of thousands. This is the destructive power of “zero model cost”.

Meta × AWS Graviton—An Alternative Solution to the GPU Cost Problem

The partnership between Meta and AWS is somewhat different in nature.

The plan is to optimize Meta’s Llama model family on AWS’s Graviton chips (Arm-based proprietary processors) to run agent AI—essentially, “autonomous AI that performs tasks”—at a lower cost.

Why is this important?

The majority of AI operating costs come from “inference costs,” which are the GPU expenses for running the models. Renting a single NVIDIA A100 can cost between $2,000 and $3,000 per month. Since agent AI calls the model dozens of times for a single task, GPU costs can skyrocket.

Graviton chips are not GPUs but CPU/Arm chips. However, by optimizing for inference, they aim to deliver equivalent inference performance at less than half the cost of NVIDIA GPUs. According to AWS’s public benchmarks, Graviton4 is reported to improve cost performance by up to 40-60% compared to x86 instances.

What does this mean for SMEs?

Consider an automated order processing agent. It reads emails, checks inventory, creates estimates, and replies. Automating this entire process 24/7 could previously cost 150,000 to 200,000 yen just for GPU expenses, but with Graviton optimization, this could potentially drop to 60,000 to 80,000 yen per month.

If this is lower than the cost of hiring a part-time office worker (150,000 to 200,000 yen), then the decision between “hiring a person or running an AI agent” becomes a serious consideration.

However, there are caveats. The Graviton optimization is still in the announcement stage, and it’s uncertain which models will be supported to what extent. This is not something to jump on immediately; it’s important to recognize the trend that “inference costs will continue to decrease”.

GPT-5.5 × Copilot—The Reality of “Doubling Developer Productivity for $10 a Month”

OpenAI has integrated GPT-5.5 into GitHub Copilot, priced at $10 per month.

According to GitHub’s own survey data, Copilot users have seen their coding speed improve by an average of 55%, with a significant increase in task completion rates. Given that GPT-5.5 has greatly improved inference capabilities and code generation accuracy over GPT-4o, this number could rise even further.

Let’s translate this into the context of SMEs.

A local system development company with five engineers. Assuming a monthly salary of 400,000 yen per person, the total labor cost is 2 million yen per month. Even if Copilot is implemented for everyone, it would only cost $50 (about 7,500 yen) per month. If productivity increases by 55%, theoretically, five people could accomplish the work of 3.2 people. Conversely, five people could handle the workload of 7.7 people.

With an investment of 7,500 yen per month, the company could effectively gain productivity equivalent to 2.7 people (worth 1.08 million yen). The ROI is approximately 14,400%. There’s no other investment like this.

Of course, not all coding tasks will be 55% faster. Design, review, and client negotiations remain human tasks. However, just the speed of writing can enable small development companies to take on projects of the same scale as large SIers. The handicap of numbers disappears.

The Structure of the Three Kingdoms—Where Should SMEs Stand?

When we organize these three movements, we see the following structure:

Force Strategy Impact on SMEs
DeepSeek (China) Open models for free and create an ecosystem Model costs become zero. The combination with proprietary data becomes the key to success
Meta × AWS (US Infrastructure Alliance) Reduce inference costs through hardware and promote agent AI Operational costs are halved. Dramatic improvement in the cost-effectiveness of automation
OpenAI × GitHub (US Platform) Embed the latest models into developer tools at low cost Development productivity doubles. Small teams can handle large jobs

While the three parties are in competition, from the perspective of SMEs, all of this is “news about decreasing costs”. Model costs, operational costs, development costs. All three major elements of AI-related costs are collapsing simultaneously.

So, what should SMEs do?

The answer is: “Use everything”.

This is difficult for large corporations. They face barriers such as security policies, vendor lock-in, and internal approval processes. “We can’t use DeepSeek because it’s made in China,” “We can only choose Graviton because we’re standardized on AWS,” “We can’t use GitHub due to contractual issues”—these are the kinds of constraints they face.

SMEs, on the other hand, have no such constraints. They can make decisions quickly and combine tools freely. This is their greatest weapon.

Here are three specific action plans:

1. What to do today: Obtain an API key for DeepSeek V4 and test it with your own business data

It’s free. The risk is zero. First, input your meeting minutes, manuals, and customer interaction logs into a 1 million token context. Just ask, “Extract the patterns from this data.” You’ll be amazed at the accuracy.

2. What to do this month: Implement Copilot for all development team members

$10 a month. This is not a sum to hesitate over. After a week of use, everyone will likely say, “I can’t go back now.” Even if you’re not a development company, anyone writing Excel macros or GAS internally should be included.

3. What to do within three months: Start a small experiment with agent AI

There’s no need to wait for the full rollout of Graviton. Using existing tools (like n8n, Dify, LangChain, etc.), try running a small automation such as “email → classification → draft reply creation”. If inference costs decrease, you can transition directly to full-scale operation.

The Winner of the “AI Infrastructure Three Kingdoms” Will Be the One Who Moves Fastest

DeepSeek, Meta × AWS, OpenAI. The struggle for supremacy among these three will continue. And as the competition intensifies, model performance will improve, and costs will decrease. This is a tailwind for SMEs in every direction.

However, the tailwind is blowing for everyone. The only difference will be “when they moved”.

What used to cost 3 million yen is now 50,000 yen. Will you simply acknowledge this fact with a “wow” or will you obtain the API key today? The landscape six months from now will be entirely different.

In an era where AI infrastructure costs are approaching zero, “not using it” becomes the biggest cost.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN