Can You Really Build an AI That Doesn’t Depend on Others’ GPUs for 18,000 Yen a Month?—A Cost Estimate of Llama.cpp, Engram, and CipherNode

OpenAIが止まったら、あなたの業務は何分で止まるか Recently, news broke that Anthropic changed its policy regarding the provision of certain m

By Kai

|

Related Articles

OpenAIが止まったら、あなたの業務は何分で止まるか

Recently, news broke that Anthropic changed its policy regarding the provision of certain models. Without going into details, there is one essential question:

“What will you do if the AI you are currently using stops working tomorrow?”

ChatGPT, Claude, Gemini—these are convenient. However, they all operate on someone else’s servers. We have no control over rising fees, discontinued models, or changes in API specifications. Is it acceptable for local small and medium-sized enterprises to bear the risk of suddenly being told, “This model will be discontinued” after paying several tens of thousands of yen each month?

Therefore, I calculated the total cost of setting up an AI environment that operates entirely on our own servers using three open-source tools. To cut to the chase, the conclusion is approximately 18,000 yen per month, or about 220,000 yen annually.

3つのツールで何ができるのか、30秒で説明する

ツール 役割 一言で言うと
Llama.cpp CPU推論エンジン GPUなしでLLMを動かす
Engram オフラインメモリサーバー AIに「記憶」を持たせる。外部通信なし
CipherNode 自己修正AIスワーム 複数エージェントが互いのエラーを検知・修正

By combining these three, you can run an AI that “thinks for itself, remembers, and corrects its own mistakes” offline. No cloud API is needed. There are no monthly usage fees. Data does not leave the server at all.

Llama.cpp:GPU不要のCPU推論、実際どこまで使えるのか

Llama.cpp is an open-source project that reimplements Meta’s LLaMA model in C/C++, allowing LLMs to run in a CPU environment without a GPU.

Typically, running an LLM requires NVIDIA GPUs (like A100 or H100), which can cost tens of thousands of yen per month if rented in the cloud. Llama.cpp has overturned that cost structure.

実際のスペックとコスト

For the 4-bit quantized Llama 3 8B model:

  • 必要スペック: CPU 4コア以上、RAM 16GB以上、ストレージ10GB
  • 推論速度: 約5〜15トークン/秒(CPU依存)
  • 対応サーバー例: 中古のDell PowerEdge T340クラスで十分
  • ハードウェア初期費用: 中古サーバーなら3〜8万円程度
  • 月額電気代+回線: 約5,000円
  • ソフトウェア費用: 0円(MITライセンス)

While it won’t match the performance of GPT-4, it can handle 80% of practical tasks for small and medium enterprises, such as internal FAQ responses, meeting minutes summarization, email drafting, and generating standard documents. Consider how many tasks truly require GPT-4; you might find that the number is surprisingly low.

Engram:「AIが昨日の会話を覚えている」をオフラインで実現する

A weakness of LLMs is their tendency to “forget.” Each conversation starts from scratch. ChatGPT’s memory feature is implemented on OpenAI’s servers, making it unusable for in-house operations.

Engram is an offline memory server compliant with the Model Context Protocol (MCP) that provides AI coding tools and LLMs with “long-term memory.”

Specifically:

  • Past conversation history, user preferences, and project information are stored locally.
  • Relevant memories are automatically retrieved and injected into the context during the next inference.
  • No external communication whatsoever. Data does not leave the server.

コストとスペック

  • 必要スペック: CPU 4コア以上、RAM 32GB以上(記憶データのインデックスにメモリを食う)、ストレージ20GB
  • 月額コスト: 約6,000円(Llama.cppと同一サーバーで兼用する場合は追加コストほぼゼロ。メモリ増設が必要な場合のみ数千円の初期投資)
  • ソフトウェア費用: 0円

This means that an AI that “understands our company” can operate offline. If it takes a new employee three months to learn the job, an AI that has accumulated memories with Engram will have that three months’ worth of context from the start. Knowledge that was previously tied to individuals can now be preserved in a reproducible form within the server.

CipherNode:AIが自分のミスを自分で直す

LLMs can lie. This issue, known as hallucination, is particularly serious in in-house operations. While cloud APIs offer the option to “wait for the model to improve,” local models must be managed independently.

CipherNode is a framework that structures multiple AI agents into a swarm to cross-check and correct each other’s outputs.

The mechanism is simple:

  1. Agent A executes a task and generates output.
  2. Agent B verifies that output and detects inconsistencies or errors.
  3. Detected errors are automatically corrected and re-executed.
  4. This loop is completed offline.

In human terms, it’s a “double-check system,” but it can run automatically 24/7.

コストとスペック

  • 必要スペック: CPU 8コア以上(複数エージェント並列実行のため)、RAM 16GB以上、ストレージ15GB
  • 月額コスト: 約7,000円(CPU負荷が高いため、電気代がやや増加)
  • ソフトウェア費用: 0円

To be honest, the difficulty of implementation is the highest among the three. Setting up communication between agents and defining correction rules requires engineering expertise. However, once set up, it becomes a system that “automatically ensures quality.” The significance of automating a previously subjective review process is substantial.

総コスト試算:結局いくらかかるのか

I calculated the costs for both a single integrated server setup and a distributed configuration.

パターンA:1台統合構成(最小コスト)

項目 コスト
サーバー(中古、CPU 8コア/RAM 64GB) 初期費用 8〜15万円
月額電気代 約5,000〜7,000円
回線費用 社内LAN利用なら0円
ソフトウェア 0円
月額ランニングコスト 約5,000〜7,000円
年間ランニングコスト 約6〜8.4万円

パターンB:役割分散構成(推奨)

If you separate the servers for Llama.cpp, Engram, and CipherNode to ensure stability:

項目 コスト
サーバー3台(中古) 初期費用 15〜30万円
月額電気代 約15,000〜18,000円
回線費用 社内LAN利用なら0円
ソフトウェア 0円
月額ランニングコスト 約15,000〜18,000円
年間ランニングコスト 約18〜22万円

For comparison, the ChatGPT Team plan costs 3,750 yen per user per month. For ten users, that amounts to 37,500 yen per month, or 450,000 yen annually. Moreover, data is sent to OpenAI’s servers, and you have zero control over model changes.

18,000 yen for in-house AI vs. 37,500 yen for someone else’s AI. You win on cost and data sovereignty. The only trade-off is performance, but as mentioned earlier, most practical tasks can be handled by local models.

導入難易度の現実——誰が構築するのか

To be honest, it is impossible for a “CEO of a small or medium-sized enterprise who is not IT-savvy” to set up this configuration alone.

ツール 導入難易度 必要なスキル
Llama.cpp ★★☆☆☆ Linuxの基本操作、コマンドライン
Engram ★★★☆☆ 上記+MCP設定の理解
CipherNode ★★★★☆ 上記+エージェント設計の知識

However, consider this: this construction task is a one-time effort. Outsourcing it would cost around 100,000 to 300,000 yen. On the other hand, the monthly fees for cloud APIs will continue indefinitely. It’s clear which is the heavier burden: “the effort to build” or “the monthly payments.”

There are engineers in rural areas who can work with Linux. Look for local technical college graduates, freelance engineers, or municipal IT support programs. If you search, you will find help.

で、結局どうすればいいのか

I propose three steps:

1. First, try Llama.cpp alone (you can do this this week)
Use your existing PC. It runs on WSL2 or Mac. Download the 4-bit quantized model of Llama 3 8B and start the llama-server. In 30 minutes, you can experience an “AI running on your own PC.”

2. Validate its usability for a week
Test it on FAQ responses, meeting minutes summarization, and email drafting. Identify tasks where you think “this is sufficient” when put into practice.

3. If you have three or more sufficient tasks, proceed to build Engram + CipherNode
Add memory and self-correction to create a robust in-house AI infrastructure. Only then should you make the decision to invest in servers.

The cost of the first step is zero. All you need is 30 minutes of your time.

「依存先」を選ぶ時代は終わりつつある

The era of choosing where to depend—OpenAI, Google, Anthropic—is coming to an end, as the option of not depending on anyone is becoming realistic.

For 18,000 yen a month, with a few used servers and three open-source tools, you can obtain an AI that “does not stop,” “does not send data outside,” and “learns to be smarter on its own.” It may not be perfect, and it cannot compete with GPT-4, but the value of not stopping can outweigh the performance differences.

For local small and medium-sized enterprises, AI is shifting from being something “to be used” to something “to be owned.” That turning point is now.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN