The 8B Model Completes Tool Operations, Running 290MB in the Browser — What Small and Medium Enterprises Should Consider in a Week Marked by the End of ‘Big AI = Strong AI’
Related Articles
Monthly API Cost of 50,000 Yen: Are You Really Still Paying?
Hitting a GPT-4 class API for 50,000 yen a month. That’s 600,000 yen a year. This was the standard cost of “using AI.”
However, this week, papers and products have emerged that turn this premise on its head. A free 8B (8 billion parameters) model has successfully completed tool operations, and a 290MB model runs in the browser. No GPU servers or API keys are needed. The implications are clear.
The era of “Big AI = Strong AI” is quietly coming to an end.
For small and medium enterprises, this is not just a technical news story. A change in cost structure means a change in the rules of competition.
—
Tool Operations Completed with the 8B Model — What Happened?
The title of the paper serves as its conclusion: “Tool Learning Needs Nothing More Than a Free 8B Language Model.” It states that all you need for tool learning is a free 8B model.
The key points of the method proposed by the research team, “TRUSTEE,” are as follows:
- No commercial APIs or annotation data required. It can be completed solely with the open-source 8B model.
- Simulates a dynamic environment, allowing the model to autonomously learn tool invocation.
- Completely skips the traditional process of “manually creating a large amount of training data.”
What are the results? It outperformed traditional methods that rely on external resources across multiple domains. In other words, it was more effective to let the 8B model learn autonomously than to feed data into a large-scale model at a cost.
The conventional wisdom held that “large models are necessary for complex tasks like tool operations.” This wisdom has been disproven by data.
In the context of small and medium enterprises, this means that the possibility of running an “AI agent to automate internal systems” on a server costing a few thousand yen a month, or even on a personal computer, has emerged. The question then becomes, what was the 50,000 yen API fee for?
—
290MB Runs in the Browser — A World Where Even Servers Are Unnecessary
It’s not just about the 8B model. Evolution is also occurring in the smaller direction.
A 290MB model operates in the browser. No server or installation is needed. Open a URL, and the AI is ready to use. This is challenging the very concept of “AI implementation costs.”
When small and medium enterprises adopt AI, the biggest hurdle is not technical capability or literacy. It’s the operational costs associated with questions like “Who manages the server?” “How is the API key managed?” and “What about security?” A browser-complete model eliminates these operational costs.
Of course, a 290MB model does not have the versatility of GPT-4. However, consider this: Most AI processing needed in small and medium enterprises does not require general intelligence.
- Drafting standard emails
- Extracting keywords from daily reports
- Simple inquiry classification
- Assisting with form inputs
For such tasks, a model with hundreds of billions of parameters is unnecessary. Many small and medium enterprises are currently in a situation where they are paying 50,000 yen for over-specification.
—
Dynamic Selection of LoRA — Breaking Away from “One Model Does It All”
Another technical point worth noting is the framework called “LoRA on the Go.”
LoRA (Low-Rank Adaptation) is a method to specialize large models for specific tasks at a lower cost. It is already widely used, but there were challenges. Specializing for Task A often degraded performance on Task B. In practice, it was necessary to prepare models for each use case, such as “for invoice processing,” “for email replies,” and “for meeting minutes summaries.”
“LoRA on the Go” solves this problem. It dynamically selects and synthesizes the optimal adapter the moment a task is input. No additional training is required. Experiments have shown performance improvements of up to 3.6% compared to traditional methods.
While 3.6% may seem small, the essence lies elsewhere. The key point is that “the operational cost of managing multiple models” becomes zero. For small and medium enterprises, managing and switching models was practically impossible. Now, that process can be automated.
With one small model and multiple LoRA adapters, various internal operations can be handled. This architecture can be operated even by companies without an AI specialist.
—
The Pitfalls of Small Models — The “Correct Answer, but Wrong Reasoning” Problem
However, we cannot be overly optimistic.
A study on a new evaluation standard called “ReTraceQA” accurately points out the weaknesses of small models. Small models often have correct final answers but flawed reasoning processes leading to those answers.
For instance, if a customer inquiry classification achieves a 90% accuracy rate, but the reasoning behind that judgment is off, it can lead to significant errors in irregular cases. Traditional methods that evaluate only based on accuracy can overlook this issue.
When small and medium enterprises adopt AI, this represents the greatest practical risk. It’s not enough to say, “It’s okay because the accuracy rate is high”; a system to verify “why that answer was reached” is necessary.
Specifically, the following operational designs are required:
1. Always attach a basis (source or rule) to the AI’s output.
2. Conduct weekly sampling to have humans check the reasoning process.
3. Implement a system that immediately alerts when anomalies occur.
Small models are inexpensive. Therefore, designing to allocate part of the saved costs to “verification mechanisms” is the correct approach.
—
Calculating the Break-Even Point — “When Should We Switch?”
Let’s think in concrete numbers.
Current Situation: Using Cloud API
- Monthly API Cost: 50,000 yen
- Annual Cost: 600,000 yen
- Operational Management: Almost zero (SaaS managed)
After Transition: Operating the 8B Model Locally or on a Small Server
- Initial Investment: 100,000 to 200,000 yen (PC with inference-capable GPU or small cloud instance)
- Monthly Operating Cost: 3,000 to 8,000 yen (electricity or cloud pay-as-you-go)
- Annual Cost: Approximately 50,000 to 150,000 yen
- Setup Time: Half a day to 2 days (with technical support)
Difference: 450,000 to 550,000 yen annually.
If we consider this over three years, the difference becomes 1.35 to 1.65 million yen. For a company with ten employees, this amount is not negligible.
Furthermore, if using a browser-complete small model, the initial investment can even be zero. If limited to tasks where performance is sufficient, the switch can be made starting today.
Of course, this does not mean replacing all tasks with small models. There are certainly scenarios where a GPT-4 class is necessary. However, if you are currently throwing all processing at the highest-spec API, you should immediately take stock of your tasks.
If 80% of tasks can be handled by small models, API costs will be reduced by 80%. The monthly fee of 50,000 yen can become 10,000 yen.
—
So, What Should We Do?
Based on this week’s technological trends, here are three specific actions for small and medium enterprises.
1. Inventory of Tasks (To Be Done This Week)
List all the processes currently being handled by AI. Label each with “Is this sufficient for a small model?” The criteria for judgment are simple: “Is it routine or non-routine?” and “Is the damage significant or minor if it makes a mistake?”
2. Experiment with Small Models (To Be Done Next Week)
Try one open-source model of the 8B class from Hugging Face. Run it on your company’s “routine and low-risk” tasks. If the accuracy exceeds 70%, move into serious consideration. If there’s a browser-complete model available, just experimenting with it is also fine. We live in an era where experiments cost nothing.
3. Design Verification Mechanisms in Advance (To Be Done Before Transition)
Before switching to small models, decide on a flow to verify outputs. As the ReTraceQA study indicates, it is dangerous to think, “It’s okay because the answer is correct.” At a minimum, prepare a three-point set of weekly sample checks, anomaly alerts, and basis displays.
—
The True Meaning of the End of “Big AI = Strong AI”
Finally, I want to discuss a structural issue.
In the era when “Big AI = Strong AI” was valid, the benefits of AI were skewed towards large enterprises. Only those with vast computational resources could utilize cutting-edge AI. Small and medium enterprises had to choose between “settling for a cheap plan” or “giving up on implementation.”
With the performance of small models reaching practical levels, this structure is beginning to collapse. Lower AI costs mean that “being able to use AI” will no longer be a competitive advantage. The differentiating factor will be “how to use AI” — that is, understanding business operations and operational design.
This is where small and medium enterprises have the potential to turn the tables. Large companies, due to their size, often have slower decision-making regarding AI utilization. The ones who best understand the operations on the ground are the small and medium enterprises doing that work.
When the cost of technology approaches near zero, the decisive factor will be “the resolution of the field.” That is something small and medium enterprises inherently possess.
Let’s start by reviewing the 50,000 yen API fee. With the saved money, more interesting experiments can be conducted.
JA
EN