Gemma 4 Runs on iPhone—Calculating the Breakeven Point for a Future Without the “Monthly 50,000 Yen Cloud AI” Cost

Conclusion To put it simply, the future where the cost of using AI drops from "50,000 yen per month" to "only the electr

By Kai

|

Related Articles

Conclusion

To put it simply, the future where the cost of using AI drops from “50,000 yen per month” to “only the electricity bill” is already upon us.

Google’s lightweight language model, “Gemma 4,” runs natively on the iPhone. There’s no need to rely on the cloud. No API fees required. With just one smartphone, you can have a reasonably functional AI at your fingertips.

What does this mean for small and medium-sized enterprises (SMEs)?

The ChatGPT Team plan costing 50,000 yen per month or API usage fees that can easily exceed 100,000 yen per month. The scenario where these costs become “zero” is starting to become technically feasible.

“Is it really true?” “Where’s the catch?”—Let’s verify with concrete numbers.

Understanding Your Monthly Cloud AI Costs

First, let’s clarify the current situation.

When SMEs use AI for their operations, the typical cost structure looks like this:

  • ChatGPT Team Plan: Approximately 4,000 yen per user per month. For five users, that’s 20,000 yen per month, or 240,000 yen annually.
  • OpenAI API Pay-As-You-Go: For GPT-4o class, about $2.5 for 1 million input tokens and about $10 for output. If used regularly, monthly costs can easily range from 30,000 to 100,000 yen.
  • Paid Plans like Claude and Gemini: 2,000 to 3,000 yen per person per month. Similar costs apply for team usage.

For a company with 10 employees, combining AI chat and API usage can lead to annual costs of 600,000 to 1,200,000 yen. This is the current “market sense” of AI usage costs.

For SMEs, an annual cost of 1,000,000 yen is not insignificant. “AI is convenient, but do we really want to keep paying this cost?”—On-device AI is starting to provide answers to this question.

What Does It Mean for Gemma 4 to Run on iPhone?

Gemma 4 is the latest addition to Google’s open-weight language model family. The key point is the lineup of “lightweight models.” Models in the 1B (1 billion parameters) and 4B class can perform inference directly on the Apple Silicon chip in the iPhone.

What happens as a result?

  • No API calls needed. You can use AI without being connected to the internet.
  • Zero pay-as-you-go costs. No matter how many times you use it, the only cost is the electricity for the device.
  • Data remains private. You can process customer information and internal documents without sending them to the cloud.

“But isn’t AI running on a smartphone going to be underwhelming?”

It’s natural to think that way. In reality, it won’t match the performance of GPT-4o or Claude 3.5 Sonnet. However, consider this: 80% of the AI processing needed for daily operations in SMEs doesn’t actually require that level of performance.

  • Drafting emails
  • Summarizing meeting minutes
  • Generating standard documents
  • Simple data organization
  • Drafting responses to FAQs

For these tasks, a 4B class lightweight model is sufficiently practical. Use cloud AI only for situations that require a “perfect score” and handle daily tasks with on-device AI. This differentiation fundamentally changes the cost structure.

Calculating the Breakeven Point

Let’s run some specific calculations.

[Current Situation: Full Use of Cloud AI]

  • ChatGPT Team Plan for 5 users: 20,000 yen per month
  • API Pay-As-You-Go (for business automation): 50,000 yen per month
  • Total: 70,000 yen per month, 840,000 yen annually

[Hybrid Structure: Moving Daily Tasks to On-Device]

  • Processing 70% of daily tasks on-device (Gemma 4, etc.): Cost 0 yen
  • Using cloud AI only for advanced analysis and generation tasks: 15,000 yen per month
  • Total: 15,000 yen per month, 180,000 yen annually

Difference: Annual savings of 660,000 yen.

For a company with 10 employees, this difference can widen even further. An annual cost reduction of over 1,000,000 yen becomes a realistic figure.

Moreover, this isn’t about sacrificing performance to cut costs. The speed of processing daily tasks actually increases. The waiting time for requests to the cloud becomes zero. With on-device inference, response times are just a few seconds, unaffected by internet congestion.

The Option of AI Running in Browsers: WebLLM

It’s not just smartphone apps. The WebLLM, an open-source inference engine, opens up another option.

WebLLM is a framework that runs LLMs within a browser. Using a technology called WebGPU, it directly taps into the device’s GPU from the browser. In other words, there’s no need to install an app. Just open Chrome, and AI runs.

What does this mean for SMEs?

  • Can be implemented even without an IT department. Just open a URL in a browser.
  • Device-agnostic. iPhone, Android, PC, tablet—if there’s a browser, it will work.
  • Management costs are almost zero. No need for app updates or server operations.

“But isn’t AI running in a browser slow?”

At this point, it doesn’t run as fast as native apps. However, as WebGPU support progresses and browser optimizations accelerate, the model sizes that can operate at practical speeds are steadily increasing. In six months or a year, the situation will likely change even further.

EdgeCIM—Hardware is Starting to Become “AI-Specific”

Another significant trend is the EdgeCIM, representing the collaborative design of hardware and software for edge devices.

Traditional smartphone chips were designed for “general-purpose” use. They can handle AI tasks but aren’t optimized for them. The EdgeCIM approach optimizes chip design from the outset for LLM inference. As a result, it achieves up to 7.3 times the throughput compared to traditional GPUs, significantly reducing power consumption.

What does this mean?

On-device AI that currently runs “fairly well” on today’s iPhones will reach a level of “smooth operation” with next-generation chips. The next step will bring it closer to a level that is “on par with the cloud.” The evolution akin to Moore’s Law is accelerating in the realm of on-device AI.

In 2 to 3 years, it wouldn’t be surprising to see a world where a single smartphone can deliver GPT-4 class performance. At that point, what relevance will monthly subscription cloud AI have?

“So, what should we do?”—Three Things SMEs Should Do Now

1. Try it out first. You can do it today.

The lightweight model of Gemma 4 is already available. Apps that can run it on iPhones are starting to emerge. Start with one device and give it a try. You might think, “Is this all there is to it?” But just six months ago, it was unrealistic to run LLMs on smartphones. Experiencing the speed of evolution is the first step.

2. Sort your company’s AI usage.

Take stock of the AI tasks you are currently using. “This one needs a high-performance model,” “This one is fine with a lightweight model”—if you can make this distinction, you can plan a transition to a hybrid structure. Many companies use GPT-4o class performance for tasks like “drafting emails.” That’s like taking a taxi to the convenience store next door.

3. Question the fixed costs of “AI billing.”

Monthly subscriptions can unknowingly become fixed costs. Are you continuing contracts just because “everyone else is using it?” Considering the evolution of on-device AI, you should develop a habit of reviewing every six months whether “this billing is really necessary?”

The Essence of This Trend is the “Democratization of AI Costs”

Finally, let’s discuss the structural aspect.

The business model of cloud AI operates on the premise that “if you want to use high-performance AI, you must keep paying every month.” While this may not hurt large corporations, it affects SMEs. As a result, disparities in AI utilization emerge.

On-device AI has the potential to overturn this structure.

As long as you have a device, you can use AI. No additional costs. Both large corporations and SMEs can compete on the same playing field. This is the “commoditization of AI,” and it is a tailwind for SMEs.

The AI systems built by large corporations at the cost of tens of millions of yen will become less relevant for most daily operations compared to on-device AI running on a single smartphone. When that day comes, what will differentiate success is not “AI performance” but the design capability of “how to integrate AI into operations.”

And I believe that design capability is more advantageous for SMEs, which are closer to the field.

While the IT departments of large corporations are going through bureaucratic processes, the president of an SME is testing AI on their smartphone and changing workflows the next day. That agility is the greatest weapon of SMEs.

Gemma 4 runs on the iPhone. This is not just a technology news story. It’s a business news story about changing cost structures.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN