Preventing the Shock of Unexpected Costs: AI Cost Prediction Tool ‘Flowcost’ and the Reality of 5x Differences in Output Token Costs
Related Articles
AI’s Problem of Inability to Estimate Costs
For small and medium-sized enterprises (SMEs) adopting AI, the biggest fear is that “it turned out to be more expensive than expected.”
What starts as a belief that the monthly cost would be around 10,000 yen can quickly balloon to 100,000 yen due to soaring API usage charges. Such stories are not uncommon. The costs associated with AI can vary significantly based on the choice of model, number of tokens, retry counts, and infrastructure configuration. Moreover, the factors contributing to this variability are so complex that it becomes extremely difficult to estimate costs in advance.
In construction, estimates are provided before starting a project. In manufacturing, cost calculations are performed. However, there was no “estimation mechanism” for AI implementation.
This is where “Flowcost” comes into play.
What Does Flowcost Do?
Flowcost is a tool that calculates the costs of AI workflows before implementation. This is crucial. Instead of being shocked by an invoice after usage, users can know how much it will cost before they start using it.
Specifically, by inputting the following elements, it calculates the expected costs:
- The AI model to be used (e.g., GPT-4o, Claude 3.5 Sonnet, Gemini, etc.)
- Estimates of input and output token counts
- The presence and frequency of retrieval (search expansion)
- The frequency of retries
- Options for infrastructure (cloud, on-premises, etc.)
This allows for comparisons such as, “If we process 10,000 inquiries per month with AI, it will cost ○ yen per month with GPT-4o and ○ yen with Claude 3.5 Sonnet,” all before implementation.
This is not just a convenient tool; it is the systematization of AI costs. It moves away from subjective intuition and vague estimates from vendors, enabling decision-making based on reproducible numbers.
The Reality of a 5x Difference in Output Token Costs
Let’s look at another angle to understand why tools like Flowcost are necessary.
Recent research on a C++-based LLM inference engine revealed shocking data. Even for the same task, the cost of output tokens can differ by up to 5 times depending on the implementation of the inference engine.
Five times. A process that should cost 20,000 yen per month can balloon to 100,000 yen if the wrong choice is made. That’s a difference of 960,000 yen annually. For SMEs, this difference is significant.
Several factors contribute to this discrepancy:
- Model quantization levels: By slightly reducing accuracy for lightweight operation, costs can drop significantly. In many business applications, a slight decrease in accuracy is not practically problematic.
- Batch processing optimization: Whether requests are processed in batches or one by one affects GPU utilization efficiency.
- KV cache management: The extent to which past interactions are cached changes the cost of recalculation.
These are technical discussions, but they directly relate to business decisions. The choice of “which engine to use” and “how to configure it” can lead to annual cost variations in the hundreds of thousands of yen.
Training a 100B Model with a Single GPU: ‘MegaTrain’
Here’s another cost-saving discussion.
Traditionally, training models with over 100 billion parameters required dozens to hundreds of GPUs. Setting up 100 NVIDIA H100s alone could cost hundreds of millions of yen. Even renting them in the cloud could mean monthly costs in the tens of millions of yen.
However, the system called “MegaTrain” is disrupting this cost structure.
MegaTrain can stably train a 120B parameter model using one H200 GPU and 1.5TB of host memory. Compared to traditional DeepSpeed ZeRO-3, the training throughput for a 14B model is 1.84 times higher. This means that approximately twice the training can be completed in the same amount of time.
While this may not directly impact SMEs right now, if this technology becomes widespread, the training costs for AI models will dramatically decrease. Lower training costs will also reduce the development costs for specialized models. As a result, the option to “create a dedicated AI model for one’s own company” will become accessible to SMEs.
Three Steps to Systematize AI Cost Management
To ensure that SMEs do not fail due to AI costs, here are actionable steps they can take right now.
Step 1: Visualize Current Usage
First, understand the usage of the AI services currently being utilized. This includes the number of API calls, token consumption, and monthly costs. Surprisingly, many companies do not have a clear grasp of these figures. If a dashboard exists, check it. If not, reverse-engineer it from invoices.
Step 2: Simulate ‘What If’ Scenarios with Flowcost
“What if we switch from GPT-4o to Claude 3.5 Sonnet?” “What if we shorten the input prompts?” “What if we reduce the retry rate?” — Simulate these “what if” scenarios in advance using Flowcost. Compare using numbers, not gut feelings.
Step 3: Conduct Monthly Cost Reviews
AI costs are not fixed expenses; they fluctuate based on usage. Therefore, a monthly review mechanism is necessary. “Costs have increased by 20% compared to last month. What is the reason?” — Simply asking this question every month can prevent runaway costs.
Only Companies That Can Manage Costs Will Continue to Use AI
The biggest risk of adopting AI is “stopping because it became too expensive.”
Initial excitement surrounds implementation. Results may be seen. However, three months later, companies may pale at the sight of their invoices. “If it costs this much, let’s stop” — many cases of AI adoption faltering occur this way.
Tools like Flowcost structurally address this issue. If costs are known in advance, budgets can be set. With a budget in place, continuity is possible. If continuity is maintained, the benefits accumulate.
There is a 5x difference in output tokens. The choice of inference engine can lead to annual variations of 1 million yen. Training costs with MegaTrain can drop to a fraction of the original. All of this information can make a difference based on whether one is aware of it.
AI is transitioning from being a “technology to use” to a “technology to manage.” Only companies that can manage costs through systematic approaches will continue to wield AI as a weapon.
JA
EN