The Era of Creating On-Site AI Tools with Vibe Coding: The Deep Divide Between ‘Working’ and ‘Usable’

The Era of Creating On-Site AI Tools with Vibe Coding: The Deep Divide Between 'Working' and 'Usable' With zero program

By Kai

|

Related Articles

The Era of Creating On-Site AI Tools with Vibe Coding: The Deep Divide Between ‘Working’ and ‘Usable’

With zero programming experience, a field worker asked ChatGPT to “create a tool like this,” and it produced seemingly appropriate code. When they ran it, it worked. — This is the essence of what is called “Vibe coding.”

A business tool that would have cost hundreds of thousands to millions of yen if outsourced can now take shape in just a few hours with natural language instructions. For small and medium-sized enterprises, this appears to be a revolutionary cost disruption. In fact, there are increasing examples of creating “simple tools” in-house for tasks such as safety management on construction sites, daily report aggregation, and photo organization.

However, one question must be raised. Is ‘working’ the same as ‘usable’?

Recent studies have provided a cold, hard numerical answer to this question. To put it bluntly, Vibe coding is powerful, but if thrown into the field as is, accidents can occur. This is especially true in areas related to safety, where there are potentially fatal pitfalls.

85% Work. But Whether They ‘Work Correctly’ Is Another Story

A recently published study evaluated the reliability and safety of 450 Python scripts generated through Vibe coding. The results are as follows:

  • About 85% of the code was executable
  • However, many contained logical flaws
  • There was almost a complete lack of defensive programming (handling of outliers and unexpected inputs)

“Working” and “working correctly” are entirely different. For example, suppose a weight calculation tool for steel frames is created using Vibe coding. If the input values are within the expected range, it produces the correct answer. However, if someone mistakenly inputs a unit incorrectly, it returns the calculation result without generating an error. This is what is meant by “lack of defensive programming.”

Tools created through outsourcing typically include input value checks, error handling, and log outputs as standard. The code generated through Vibe coding lacks these features. It looks like a finished product, but inside it is a prototype. This is the biggest blind spot.

“It Was Working Until Yesterday” — The Drift Problem of AI Code

Another issue often overlooked in the field is drift.

Even if you throw the same prompt to the same AI, a month later, it will produce different code. AI models are updated regularly, and the training data changes. As a result, code that previously worked correctly may behave slightly differently when regenerated. This is drift.

To address this issue, an open-source tool called “VibeDrift” has been developed. It quantitatively measures the quality changes in AI-generated code.

What this means is clear. Tools created with Vibe coding are not a one-and-done deal. They need to be regularly checked to see if they are still functioning correctly.

Can small and medium-sized enterprises manage this? Honestly, it’s challenging. That’s why it’s crucial to draw a line between what can and cannot be created using Vibe coding.

Security Is Even More Serious — Only 23.8% of Code Is Safe

There’s even harsher data available.

In a new benchmark called “SecureVibeBench,” the security of code generated by LLM-based coding agents was evaluated. The results from 105 C/C++ secure coding tasks are as follows:

  • Even the best agents could only generate accurate and safe code 23.8% of the time.

That’s less than one in four. This is a matter of security. Buffer overflows, memory leaks, injection vulnerabilities — items that professional engineers always check are overlooked by AI more than 70% of the time.

Of course, it’s rare to use C/C++ for business tools on construction sites. Python and spreadsheet integrations are likely the main battlegrounds. However, the structural implications of this number remain the same. AI excels at generating “working code” but struggles with “safe code.” This fact is independent of the programming language.

So, Should Small and Medium-Sized Enterprises Avoid Vibe Coding?

No, quite the opposite. If used correctly, there is no more advantageous tool for small and medium-sized enterprises.

The key is to discern “where to use it.”

Areas Suitable for Vibe Coding:

  • Automatic aggregation of daily reports, automatic organization of photos, etc., tasks that do not endanger human lives even if mistakes are made
  • Data visualization, simple dashboard creation
  • Internal notification bots, checklist generation
  • Small scripts that automate tasks previously done manually in 30 minutes

Areas to Avoid with Vibe Coding:

  • Structural calculations, load calculations, and other calculations directly related to safety
  • Systems handling customer information or personal data
  • Web applications that are publicly accessible
  • Data processing used as legal records

As long as this line can be drawn, the cost-disruptive power of Vibe coding is tremendous. Business tools that previously cost between 500,000 and 3 million yen to outsource can now be created by field personnel in half a day. Moreover, since the tools are created by those on-site, they perfectly match the needs of the field. The hell of writing specifications for outsourcing, submitting three revision requests, and waiting two months is eliminated.

The Biggest Bottleneck Will Be ‘Someone Who Can Review’

That said, there is one problem that needs to be solved. There are no people in small and medium-sized enterprises who can review the generated code.

In large companies, there are engineers in-house. There is a culture of code reviews. Small and medium-sized enterprises lack this. Tools created with Vibe coding may continue to operate in a state of “somewhat working,” only to suddenly produce strange results one day. When that happens, who can identify the cause?

This presents a new business opportunity: “Review and audit services for tools created with Vibe coding.” The need for coding jobs may decrease, but the need for code review jobs will increase. Moreover, if small and medium-sized enterprises begin to adopt Vibe coding en masse, that demand will explode.

There is a strong possibility that review services costing several tens of thousands of yen per case could be established. For local IT vendors, this could become a new source of revenue to replace existing contract development.

So, What Should Be Done?

To summarize:

1. Use Vibe coding, but start in areas where “mistakes won’t lead to death.” Daily report aggregation, photo organization, internal notifications. Starting here minimizes risk while maximizing cost reduction benefits.

2. Never rely solely on Vibe coding for safety and security-related aspects. 85% working means 15% failing. In areas where safety is required, that probability is unacceptable.

3. Don’t consider it done once created. Due to AI model drift, there’s a possibility that code that was correct yesterday may break tomorrow. Implement a system for checking functionality at least once a month.

4. Have an external review mechanism. If there are no engineers in-house, utilize external review services. Given the reduced costs of code generation, investing in reviews is a rational choice.

Vibe coding will undoubtedly be a weapon for small and medium-sized enterprises. However, a weapon can harm you if misused. Rather than celebrating that it “worked,” the ability to question, “Is this really working correctly?” is what will allow the field to reap the maximum benefits of this technology.

When the cost of technology decreases, the value that increases is “judgment.” Knowing what to use it for and what not to use it for. The small and medium-sized enterprises that can make that judgment will survive the next decade.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN