A World Where GPU Costs Are Halved Is Coming — What Amazon’s AI Chip Outsourcing and Baseten’s $1.5 Billion Funding Mean for Small and Medium Enterprises
Related Articles
Conclusion
Let’s get straight to the point: “AI is impossible because GPU costs are high” is coming to an end.
Amazon is set to sell its self-developed AI chip, “Trainium.” CEO Andy Jassy has stated that this venture could represent a market opportunity of up to $50 billion (about 7 trillion yen). At the same time, AI inference infrastructure startup Baseten has raised $1.5 billion at a valuation of $13 billion.
When read separately, these two pieces of news might seem like just stories about large corporations. However, when we look at the structure, we can see that significant changes are occurring that cannot be ignored by local small and medium enterprises.
The future where GPU computing costs are less than half has become quite realistic.
Why can we say this? What is changing? And what should small and medium enterprises do now? Let’s break it down step by step.
—
Why Is “Half Price” Realistic? — The Collapse of Nvidia’s Monopoly
First, let’s establish a premise. The primary reason for the high AI computing costs today is Nvidia’s monopoly.
Nvidia holds an estimated market share of over 80% in the GPU market used for AI inference and training. The H100 chip costs about $30,000 to $40,000 (approximately 4.5 million to 6 million yen) per unit. Demand significantly exceeds supply, keeping prices high. In a monopoly market, prices do not decrease. This is basic economics.
Amazon is now entering this arena. The self-developed Trainium chip has already been available on AWS cloud, but it will now be sold directly to server manufacturers and data center operators. This means that Amazon’s chips will be usable in environments other than AWS.
Amazon’s aim is clear: to recoup the chip costs it currently pays to Nvidia and to generate profits through external sales. When Jassy refers to a “$50 billion opportunity,” it reflects just how much money is currently flowing to Nvidia.
As competition increases, prices will drop. Google (TPU), Microsoft, and Meta are also developing their own chips. However, Amazon is ahead of the curve by venturing into external sales. By distributing its chips in the market, it aims to shift the very “market price” of GPUs.
According to AWS’s public benchmarks, the inference cost of Trainium is reported to be 40-50% cheaper compared to Nvidia GPUs. Once external sales ramp up, similar cost reductions can be expected in third-party cloud and on-premise environments.
—
What Baseten’s $1.5 Billion Funding Means — Inference Becomes the Main Battleground
Baseten’s funding amount of $1.5 billion and valuation of $13 billion may sound impressive, but what’s crucial is what this investment is directed towards.
Baseten is a company focused on “AI inference infrastructure.” It provides a platform for companies to run AI models in production environments (i.e., to perform inference). The key point is that it specializes in inference rather than training.
Why is inference important? It becomes clear when we break down the cost structure of AI.
- Training Costs: This is the phase of creating a model. It’s a one-time process (though it may need to be repeated occasionally).
- Inference Costs: This is the phase of using the model. Costs are incurred every time a user calls the API. The more it is used, the higher the costs.
As AI becomes more widespread, the proportion of inference costs in the overall cost structure increases. Some estimates suggest that inference already accounts for over 60% of AI-related computing costs, and this could exceed 80% in the future.
In other words, the $1.5 billion raised by Baseten reflects investors’ confidence that “those who reduce inference costs will dominate the market.” When considered alongside Amazon’s chip outsourcing, it becomes clear that a competition to lower inference costs is intensifying.
—
How Will AI Budgets for Small and Medium Enterprises Change? — A Concrete Calculation
Now, let’s get to the crux of the matter. What does it mean for small and medium enterprises if “GPU costs are halved”? Let’s calculate it concretely.
Case 1: Using AI via API
Currently, many small and medium enterprises are using APIs from OpenAI or Anthropic to improve operational efficiency. Monthly API usage fees typically range from 50,000 to 200,000 yen.
GPU computing costs account for a large portion of these API fees. If GPU costs were to be halved, API fees could potentially decrease by 30-50% (since there are margins for API providers, GPU costs won’t be reflected directly).
- Monthly API usage of 100,000 yen → Reduced to 50,000 to 70,000 yen
- Annual savings of 360,000 to 600,000 yen
For small and medium enterprises, an annual difference of several hundred thousand yen is significant. However, the real impact goes beyond this.
Case 2: Increased Capabilities Due to Lower Costs
This is the more important scenario.
What if AI applications that were previously put on hold because they cost 200,000 yen a month could now be done for 100,000 yen a month? For example:
- Automating Customer Support: Assigning initial responses to inquiries to AI. The cost of processing 300 inquiries a month, which was 150,000 yen, could drop to 80,000 yen. This is cheaper than hiring one part-time employee.
- AI for Internal Knowledge Search: Enabling AI to search internal manuals and past project information. The setup cost of 1 million yen and monthly operation of 50,000 yen could be reduced to a setup of 500,000 yen and monthly operation of 30,000 yen.
- Automated Image and Video Generation: Changing backgrounds of product photos or generating short videos for social media. What cost 5,000 yen for outsourcing could be generated by AI for less than 500 yen.
Halving costs means that with the same budget, you can do twice as much. Alternatively, initiatives that previously didn’t have a good ROI may now become “worth pursuing.”
Case 3: Hosting Models In-House
This is a slightly longer-term consideration, but it represents the area where the most dramatic changes will occur.
When running open-source LLMs (like Llama or Mistral) on in-house servers or the cloud, GPU costs directly affect expenses. Currently, renting GPU instances for inference on AWS costs about 200,000 to 500,000 yen per month. If this were halved, it would drop to 100,000 to 250,000 yen per month.
For companies that do not want to expose their data externally, especially small and medium enterprises in manufacturing or healthcare, the barrier to having a “dedicated AI” will drop significantly.
—
Should You Wait or Act Now? — Actions Small and Medium Enterprises Should Take
If GPU costs are going down, should you wait a little longer?
The answer is No. You should start immediately.
There are three reasons for this.
1. The decrease in costs applies to “computational expenses” but not to “training costs.”
Incorporating AI into business involves selecting tools, designing prompts, restructuring workflows, and training employees — these “human learning costs” are unrelated to GPU expenses, and companies that start early will recoup their investments faster.
2. Starting after costs decrease means you’ll be two years behind competitors.
The substantial drop in GPU costs is expected around late 2025 to 2026. It will take an additional six months to a year to see results after starting implementation. This means that companies starting now will be two years ahead of those who wait. For small and medium enterprises, a two-year gap can be fatal.
3. There are already initiatives that yield sufficient ROI even at current costs.
It’s common to find cases where using APIs costing several tens of thousands of yen can reduce labor hours for customer support or document creation by 20-30 hours a month. This translates to an effect worth 50,000 to 100,000 yen per month. This can pay off even at current costs, and further profits will come when costs decrease.
—
The Real Point to Focus On — The World Beyond Lower Costs
Both Amazon’s chip outsourcing and Baseten’s funding ultimately point in one direction.
“AI inference will become a utility.”
Just like electricity or the internet, AI computing power will become a cheap utility that can be used anywhere. When that happens, the source of competitive advantage will no longer be “the ability to use AI” because everyone will have access to it.
The differentiating factor will be whether an organization can think about and execute “what to make AI do.”
Large corporations, due to their size, often have slower decision-making processes. In terms of speed in solving on-site issues with AI, small and medium enterprises led by CEOs who are directly involved in the field can act much faster. This represents a structural advantage for small and medium enterprises.
GPU costs will be halved. The ones who will benefit the most are the companies that start experimenting with incorporating AI into their operations now.
The question is whether they are prepared to start running immediately on the cheaper infrastructure.
—
Summary: Three Facts and One Question
- Amazon’s AI chip outsourcing could reduce GPU computing costs by 40-50%.
- Baseten’s $1.5 billion funding indicates the intensification of the competition to lower inference costs.
- Small and medium enterprises are moving towards an era where they can implement double the initiatives with the same budget.
And the question is:
Is your company in a position to immediately benefit when GPU costs are halved?
If the answer is No, then start small today. A monthly API usage of 10,000 yen is sufficient. Test it with one business process first. When costs drop, that experience will become your greatest asset.
JA
EN