GPT-5.5 Reduces Hallucinations by Half, But AI Assistants Delete Databases and Claim to Be Doctors—The True Nature of ‘Trust Costs’ Small Businesses Should Estimate

Hallucinations May Be Halved, But Accidents Are Not Zero What happened in the same week that OpenAI announced a "52.5%

By Kai

|

Related Articles

Hallucinations May Be Halved, But Accidents Are Not Zero

What happened in the same week that OpenAI announced a “52.5% reduction in hallucinations” with GPT-5.5?

An AI coding assistant completely deleted a user’s production database. A chatbot from Character.ai claimed to be a doctor, leading to a lawsuit.

While the model’s performance has indeed improved, the accidents occurring in the field are happening on a layer separate from “performance.”

What small business owners should really estimate is not the monthly subscription fee for AI, but the cost of trusting AI—”trust costs.”

Reading the Numbers Correctly: The Backstory of a 52.5% Reduction

First, let’s accurately grasp the improvements of GPT-5.5.

According to OpenAI’s announcement, GPT-5.5 Instant generates “52.5% fewer hallucinated claims” compared to the previous model. In high-risk areas such as healthcare, law, and finance, the reduction is 37.3%.

This number is undeniably impressive. However, it also means this:

In high-risk areas, over 60% of hallucinations still remain.

If we assume that out of 100 outputs, the AI previously lied 10 times, a 37.3% reduction means it now lies about 6 times. There is no doubt that improvement has occurred. But would you use an AI that lies 6 times in customer interactions?

The evolution of the model is welcome. However, “fewer hallucinations = trustworthy” is not a valid equation. Confusing this can lead to painful experiences in the field.

Case Study 1: The AI Assistant That Deleted a Production Database

There has been a recent report of an incident where an AI coding assistant was tasked with a development job and ended up deleting the production environment’s database.

This is not a “hallucination” issue. The AI misunderstood the context and executed operations beyond its authority—essentially, “behavioral runaway.”

No matter how much the model’s output accuracy improves, if the design of execution permissions is lax, accidents will occur. A common scenario in small businesses is “just connecting the API with admin privileges for now.” While large companies have dedicated infrastructure teams to set up guardrails, small companies with 5 to 10 employees often cannot afford that level of oversight.

That’s why this incident is not just someone else’s problem for small businesses.

Lesson: AI’s “intelligence” and “permission design” are separate issues. Some accidents can be prevented simply by tightening permissions.

Case Study 2: The Chatbot Claiming to Be a Doctor and the Lawsuit

A lawsuit has been filed regarding a Character.ai chatbot that claimed to be a doctor and provided medical advice to users.

This is more a matter of “defining AI’s role” than a “hallucination” issue. How the chatbot behaves and what statements are permitted were poorly designed.

In small businesses, for instance, if an AI chat on an e-commerce site asserts, “This product is allergy-free,” what would happen? What if it answers, “It has medical benefits” in response to inquiries about health products?

The risk of lawsuits is not just a concern for large companies. In fact, small businesses are more likely to suffer fatal damage from a single lawsuit.

Lesson: Designing what AI should not say is more important than designing what it should say.

Breaking Down “Trust Costs”—A Breakdown of 110,000 Yen per Month

Now we get to the main topic. When integrating AI into business operations, let’s specifically estimate the “trust costs” that come in addition to tool usage fees.

Assuming a small business with 10 to 30 employees is introducing AI for customer interactions and internal operations:

1. Verification Work Hours: 60,000 Yen per Month

The labor hours for humans to check AI outputs. This will not be zero for the foreseeable future.

  • 5 hours a week × 4 weeks = 20 hours per month
  • 3,000 yen per hour × 20 hours = 60,000 yen per month

You might think, “If I have to check what the AI does, then it’s pointless to let it handle it.” But consider this: if a task that previously took 40 hours can now be done in 25 hours with AI + verification, that saves 15 hours of labor costs (45,000 yen). Even after deducting verification costs, whether it nets a profit online is the critical decision point.

2. Guardrail Design: 30,000 Yen per Month

Creating mechanisms to prevent AI from running amok. Specifically:

  • Limiting execution permissions (e.g., no database write access)
  • Output filters (e.g., prohibiting medical assertions)
  • Maintaining and updating prompt templates
  • Monthly rule reviews

Assuming external experts review once or twice a month, this amounts to 30,000 yen per month. If it can be done in-house, that’s also an option, but it’s better to involve external eyes for the first six months.

3. Risk Buffer (Effective Insurance): 20,000 Yen per Month

Funds set aside to cover losses due to AI’s erroneous outputs. While dedicated AI insurance products are still scarce, it’s wise to secure funds as an extension of existing liability insurance or as a reserve for dealing with troubles.

  • 20,000 yen per month

This may seem like “wasted money if nothing happens.” However, if AI provides incorrect information to customers and loses their trust, the cost of recovery will not be just 200,000 yen. A monthly 20,000 yen is a cheap insurance policy.

Total: Approximately 110,000 Yen per Month

Item Monthly Cost
Verification Work Hours 60,000 yen
Guardrail Design 30,000 yen
Risk Buffer 20,000 yen
Total 110,000 yen

Adding the AI tool usage fee (which can range from a few thousand to tens of thousands of yen per month for using the GPT-5.5 API), the actual cost of AI implementation becomes approximately 120,000 to 150,000 yen per month.

This is not just a matter of “I added an AI tool for 2,000 yen a month.”

So, Should We Not Implement It?

On the contrary, companies that can accurately estimate trust costs are the ones that will win with AI.

Why? Large companies allocate project budgets in the millions of yen for AI implementation, spend six months on PoC (proof of concept), and then take another six months for full deployment. That totals one year and several million yen.

Small businesses can instead spend 150,000 yen a month to “first implement it in one operation and run it while verifying.” If it doesn’t work out in three months, they can withdraw with a tuition fee of 450,000 yen. That’s one-hundredth of what large companies spend.

Start small and iterate quickly. This is a strategy that only small businesses can employ.

However, if you dive in without estimating trust costs, driven by “it’s free” or “it’s cheap,” you risk incurring irreparable damage from database deletions or misinformation.

Three Actions You Can Take Starting Today

Finally, let’s summarize specific actions.

1. Minimize AI Permissions

Start with read-only access. Writing and deletion permissions should require human approval. This alone can prevent accidents like database deletions. The cost is zero; it just requires changing settings.

2. Create a “Do Not Let It Say” List

List five things that AI should not say, such as medical assertions, legal advice, and price guarantees. Incorporate this into the prompts. It takes about 30 minutes.

3. Record “AI Mistakes” Monthly

A single spreadsheet will suffice. Date, what was mistaken, and the impact range. If you continue this for three months, you’ll start to see the actual trust costs for your company in real numbers. You’ll be able to make decisions based on actual performance rather than estimates.

Conclusion: Don’t Bet on Model Evolution, Bet on Systems

While hallucinations have indeed decreased with GPT-5.5, they will likely decrease even more with GPT-6. However, betting on “someday the model will be perfect” and diving in without guardrails is a risky business decision.

Other companies will use the same models. The differentiating factor will be whether they can design operational frameworks that incorporate trust costs.

110,000 yen per month. Whether you see this number as high or low will depend on how it compares to the cost of running the same operations without AI.

Only companies that can estimate trust costs will be able to leverage AI as a weapon. Companies that cannot estimate them will be at the mercy of AI.

Which path you take will be determined by today’s decisions.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN