DeepSeek V4 Runs on Consumer GPUs. NVIDIA Releases 4-Bit Learning Technology. — The Era of ‘Not Buying GPUs’ Breaks Down Barriers for SMEs Entering AI
Related Articles
Conclusion
Let’s get straight to the point: The cost of running AI has changed dramatically.
Until just last week, running a decent AI model locally required a machine equipped with multiple GPUs costing hundreds of thousands of yen. However, this week, two news items have completely transformed the landscape.
The first is DeepSeek V4 Flash — a large-scale language model that runs on consumer-grade GPUs (with 24GB VRAM).
The second is NVIDIA’s NVFP4 — a pre-training technology that operates at 4-bit precision, compressing model size and computational costs to less than half of what they were.
Business owners at SMEs should seriously consider the implications of these two developments arriving in the same week.
—
What Happened: The Moment When “Luxury” Became “Commodity”
First, let’s clarify the facts.
DeepSeek V4 Flash, despite its large number of parameters, operates inference on a single GPU with 24GB VRAM thanks to quantization (a model compression technique). The NVIDIA RTX 4090 currently costs around 280,000 yen, while the RTX 5090 is priced at about 350,000 yen. You might think that’s expensive, but with just one of these, you can run your own dedicated AI right at your fingertips.
On the other hand, NVIDIA’s NVFP4 represents a revolution on the training side. Traditionally, training AI models has relied on FP16 (16-bit floating point) as the standard. This new technology reduces that to 4 bits, resulting in a memory usage reduction to one-fourth. This decreases the number of GPUs needed for training, shortens the time required, and lowers electricity costs.
In other words, both the costs of “running” and “training” AI have dropped significantly.
—
Discussing Costs: What Changes for SMEs?
Let’s skip the abstract discussions and look at the numbers.
Pattern A: Continuing to Use Cloud AI (GPT-4 Class)
Consider a company with 30 employees utilizing AI for sales, customer support, and internal document creation. It’s not uncommon for API costs to range from 3,000 to 5,000 yen per person per month.
- Monthly cost: 30 people × 4,000 yen = 120,000 yen
- Annual cost: 1,440,000 yen
- Over 3 years: 4,320,000 yen
Moreover, this is a “pay-as-you-go” model, meaning costs can escalate with increased usage. Data is sent to external servers, making it unusable for many businesses that handle confidential information.
Pattern B: Running Locally with GPU + DeepSeek V4 Flash
- GPU (RTX 4090): 280,000 yen (1 unit)
- PC (GPU-equipped workstation): 150,000 to 200,000 yen (including upgrades to existing PCs)
- Initial setup: 50,000 to 150,000 yen (internal or external support)
- Electricity costs: Approximately 3,000 to 5,000 yen per month (for 1 GPU)
- Software: 0 yen (open source)
Initial investment: about 500,000 yen
Monthly running cost: about 5,000 yen
Annual running cost: about 60,000 yen
Total cost over 3 years: about 680,000 yen
4,320,000 yen vs 680,000 yen.
That’s a difference of over six times. Furthermore, with local operations, customer data and internal documents don’t need to be sent outside, alleviating the structural concerns of local SMEs about information leakage.
—
What Happens Beyond “Lower Costs”?
Now, let’s get to the main point. Lowering costs is merely a means to an end. The real question is, “What becomes possible once costs are reduced?”
1. Breaking Down Dependency on Individuals
One of the most serious issues for local SMEs is the reliance on specific individuals. The operational knowledge in the heads of veteran employees, their histories of interactions with clients, and the nuances of estimates are all tied to individuals.
By feeding local AI with company data, this tacit knowledge can be transformed into a system. An AI trained on 1,000 past estimates can become a tool for new employees to use. The structure where the risk of employee turnover directly translates to business risk can change.
Can cloud AI achieve this? Yes, but can you send your company’s estimates or customer information to OpenAI’s servers? Many SME owners would say “NO.” Because it operates locally, you can input truly valuable data. This is the decisive difference.
2. The Cost of “Trying” Approaches Zero
Traditionally, integrating AI into business required 1 to 3 million yen just for a Proof of Concept (PoC). You would have to hire a system integrator, define requirements, and three months later, you’d finally get a report saying, “We’ve determined whether it’s usable.”
With local AI, you can set it up today and test it with your company data tomorrow. If it doesn’t work, you can switch to another model. If DeepSeek V4 doesn’t work, try Llama 4. The cost is zero.
This “collapse of trial costs” is even more significant for SMEs than for large corporations. Large companies take three months for approvals and decision-making. For a company with 30 employees, if the president says, “Let’s give it a try,” they can start today. The speed of decision-making becomes a direct competitive advantage.
3. The Gap Between “Companies That Can Use AI” and “Those That Cannot” Becomes Irreversible
This is a warning.
The reduction of AI implementation costs means there will be no more excuses for not adopting it. Three years from now, when competitors in the same industry are completing estimates in five minutes using AI, will your company still have veteran employees manually taking two hours?
The evolution of technology has shifted the issue from “Can we do it?” to “Will we do it?” And this gap will only widen over time.
—
What NVIDIA’s 4-Bit Learning (NVFP4) Means
Let’s discuss the technology a bit more. The essence of NVFP4 is the “democratization of learning.”
Traditionally, fine-tuning AI models with company data required data center GPUs like the A100 or H100, which cost between 1 million and 5 million yen per unit. This was an untenable investment for SMEs.
With the practical application of 4-bit learning, fine-tuning becomes feasible even with consumer GPUs like the RTX 4090 or RTX 5090. This means that not only can you “run” AI, but you can also “develop it for your company” on your own machine.
The cost of inference (using) has decreased. The cost of learning (developing) has also decreased. Both of these changes are happening simultaneously. This is the true significance of the news from this week.
—
So, What Should We Do?
Here are three things that SME owners should do today.
1. Buy one GPU and start experimenting.
An RTX 4090 (about 280,000 yen) is sufficient. You can find used ones for under 200,000 yen. Using tools like ollama, you can set up DeepSeek V4 Flash in just one hour.
2. Identify your company’s “points of dependency.”
List tasks that can only be done by specific individuals, such as estimates, responding to inquiries, summarizing daily reports, and creating manuals. These will be the first areas to introduce AI.
3. Don’t wait for “perfection.”
Current local AI isn’t perfect. There may be instances where its accuracy falls short compared to GPT-4. However, which is more productive for the overall business: an AI that provides an 80-point answer in five seconds or a veteran employee who takes two hours to deliver a 100-point answer? The answer is clear.
—
This Trend Will Not Stop
DeepSeek V4 Flash, NVFP4. You don’t need to remember the names. What you should remember is the structural change.
The era of needing a fortune to use AI is over.
Just six months ago, it was realistic to use cloud APIs. Now, it’s becoming cheaper and safer to run AI locally. In six months, performance will improve further, and costs will decrease.
Large corporations are tied down by contracts with vendors and internal adjustments, making it difficult for them to respond quickly to these changes. SMEs are different. If the president decides, they can act by next week.
The time has come when “small things” can finally become true weapons.
I want to ask: Will your company ride this wave of change, or will it sit on the sidelines? Do you believe that the same choices will still be available three years from now if you choose to wait?
JA
EN