An Era Where AI Operates with Bundled Old Smartphones—Three Practical Solutions for Running LLMs Under 10,000 Yen a Month

Conclusion To put it simply, the operational cost of AI has entered a realm of "under 10,000 yen a month." How much are

By Kai

|

Related Articles

Conclusion

To put it simply, the operational cost of AI has entered a realm of “under 10,000 yen a month.”

How much are you paying for the ChatGPT API each month? For small and medium-sized enterprises, it’s not uncommon to spend between 50,000 to 100,000 yen. For large corporations, it can reach several million yen. Behind this, data centers are using 29 million gallons (approximately 110 million liters) of water annually for cooling.

Isn’t this structure strange?

“Using AI requires massive infrastructure”—this premise is quietly beginning to crumble. By bundling old smartphones to create LLM clusters, setting up LLM servers on local Macs, and minimizing API costs through multiple agent CLI controls, individuals and small businesses can start using what they already have.

Moreover, the monthly costs for all these methods are under 10,000 yen.

Let’s compare these three methods in terms of cost, difficulty, and practicality.

Method 1: Building an LLM Cluster by Bundling Old Smartphones

What’s Impressive

Old smartphones lying dormant in your drawer can become “AI computing resources.”

Specifically, by connecting old iPhones or Android devices to a Wi-Fi network and running open-source inference engines like llama.cpp, you can operate them as a distributed cluster. While a single device may be underpowered, reports suggest that bundling 5 to 10 devices can run models with 7 billion parameters at practical speeds.

Cost Breakdown

  • Initial Investment: 0 to a few thousand yen (even if you buy used smartphones, each costs around 2,000 to 5,000 yen)
  • Monthly Electricity Cost: Approximately 500 to 1,500 yen (even with 10 smartphones running at full capacity, total power consumption is about 30 to 50W)
  • API Costs: 0 yen (completely local)

In other words, you can create an environment for running LLMs for under 1,500 yen a month.

Caution

However, at this point, it remains in the realm of “experimental.” There are many challenges, such as communication delays between smartphones, memory constraints (3 to 6GB per device), and heat management. It is not suitable for applications that require stable handling of a large number of requests.

That said, for internal uses like FAQ bots, daily report summaries, and simple text proofreading, it is sufficiently practical. Before purchasing a 3 million yen server, try using the smartphones in your drawer. This idea itself can be a weapon for small and medium-sized enterprises.

Method 2: Minimizing API Costs through Multiple Agent CLI Adjustments

What’s Impressive

If you continue to use cloud LLM APIs, the challenge becomes “how to reduce unnecessary calls.” Here, tools like Endy, which orchestrate multiple agents, are gaining attention.

Endy is an open-source project available on GitHub that allows for bulk control of multiple coding agents from the CLI. The key points are “task allocation” and “model differentiation.”

For example, you can operate as follows:

  • Simple Tasks (code formatting, template generation) → Assign to inexpensive small models (like GPT-4o-mini or Claude 3 Haiku)
  • Complex Tasks (design judgments, long text analysis) → Assign to high-performance models (like GPT-4o or Claude Sonnet)
  • Eliminate unnecessary re-calls with caching

Just automating this “model differentiation based on tasks” can reduce API costs by more than half.

Cost Breakdown

  • Tool Itself: Free (open-source)
  • Monthly API Costs: 3,000 to 5,000 yen (when properly adjusted. Cases before adjustment could cost 10,000 to 30,000 yen)
  • Initial Investment: 0 yen

For 5,000 yen a month, you can set up an environment that processes hundreds to thousands of requests.

Caution

Since it is CLI-based, the barrier to entry is high without an engineer. However, once set up, operations can be automated. “If an engineer spends a day setting it up, they can save 20,000 yen in API costs from the following month”—this return on investment is more than sufficient for small and medium-sized enterprises.

Method 3: Setting Up a Local LLM Server on an Apple Silicon Mac

What’s Impressive

With the unified memory structure of Apple Silicon Macs (where CPU and GPU share the same memory), inference processing for LLMs operates with remarkable efficiency. Projects like OMLX (formerly MLX Examples) have taken notice of this.

Operating on Apple’s MLX framework, if you have an M2, M3, or M4 Mac, you can comfortably run models with 13 billion parameters locally. With an M4 Pro (48GB memory), even 70 billion parameter models come into play.

Cost Breakdown

  • Initial Investment: 0 yen if you already own a Mac. If purchasing new, an M2 Mac mini costs around 100,000 yen.
  • Monthly Electricity Cost: Approximately 300 to 1,000 yen (the Mac mini consumes 30 to 60W even under load)
  • API Costs: 0 yen (completely local)

For under 1,000 yen a month, you can achieve this. In fact, there are cases where the response is even faster than via API due to zero network latency.

Caution

The biggest advantage is that “data does not leave the premises.” For small and medium-sized enterprises handling customer information or internal documents, this is a decisive benefit. The anxiety of throwing internal data into a cloud API—this is completely alleviated.

On the other hand, you will need to handle model updates and tuning yourself. However, by combining tools like Ollama, swapping models can be done with a single command.

Comparing the Three Methods

Smartphone Cluster Agent CLI Adjustment Local Mac Server
Monthly Cost 500 to 1,500 yen 3,000 to 5,000 yen 300 to 1,000 yen
Initial Investment 0 to 20,000 yen 0 yen 0 to 150,000 yen
Implementation Difficulty High Medium Low
Processing Performance 7B Class API Dependent (can be top-tier) 13 to 70B Class
External Data Transmission None Yes (API Usage) None
Stability Experimental High High
Suitable Uses Lightweight Tasks, Experiments Development, Coding Support Internal Document Processing, FAQs, Summaries

So, What Should Small and Medium-Sized Enterprises Do?

I won’t say to do all three. The realistic order of implementation is as follows.

First, start with Method 3, the local Mac server.

The reason is simple. Many companies already have Macs. If you just install Ollama and download the model, it can be up and running in 30 minutes. Data also does not leave the premises. The monthly cost is almost just the electricity bill. There is no lower barrier to entry for “trying out AI.”

Next, in situations where API usage is necessary (cases requiring high-accuracy responses), implement Method 2, the agent CLI adjustment. Just automating model differentiation can cut API costs in half.

Method 1, the smartphone cluster, is more about understanding that “this kind of world is coming” rather than immediate practicality. As technology matures, the day will surely come when used smartphones become the computing resources for small and medium-sized enterprises.

What Really Changes is Not “Cost” but the “Structure of Decision-Making”

Finally, I want to pose a question.

When the monthly cost of AI drops below 10,000 yen, will only the expense figures change?

No. “Whether to use AI or not” will no longer be a management decision.

If it’s 1 million yen a month, it requires approval. If it’s 10,000 yen a month, the person in charge on the ground can start it based on their own judgment. This is not “cost reduction” but rather “democratization of decision-making.”

Large corporations have massive data centers. However, small and medium-sized enterprises have the weapon of being able to act immediately based on on-site judgment. An AI infrastructure under 10,000 yen a month maximizes that weapon.

With smartphones in your drawer, a Mac on your desk, and a terminal CLI, AI can already operate with what you have at hand.

Start by installing Ollama today.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN