40% of AI-Written Code Contains Security Flaws—How Can SMEs Cross the Chasm Between ‘Can Create’ and ‘Can Use’?
Related Articles

AI-Written Code: 60% Rated ‘F’—Is Your Company Safe?
Having AI write code is becoming increasingly common. Especially for small and medium-sized enterprises (SMEs), the decision to use AI instead of hiring engineers is on the rise. With tools like Copilot and Cursor costing only a few thousand yen per month, development that previously required outsourcing for 1 million yen is now starting to happen in-house. This changes the cost structure, which is a positive development in itself.
However, there is one critical issue. About 60% of code written by AI contains security vulnerabilities.
This is not just a matter of perception. The latest formal verification study, “Broken by Default,” analyzed 3,500 pieces of AI-generated code. It found vulnerabilities in 62.4% of the code produced by GPT-4o, resulting in a dismal F rating. Claude and Gemini also exceeded 50%. This means that if AI-written code is deployed directly into production, there is a greater than 50% chance that the system will have security holes.
For SMEs, this is not merely a “technical issue”; it is a fundamental business risk.
The Chasm Between ‘Can Create’ and ‘Can Use’
The proliferation of AI coding tools has dramatically reduced the cost of “creating” code. Development that used to cost 3 million yen and take three months can now produce functional outputs in just a few days with AI. From this perspective, it seems revolutionary.
However, there is a vast chasm between “can create” and “can safely use in production.”
What exactly happens? Common vulnerabilities found in AI-generated code include SQL injection, cross-site scripting (XSS), and authentication bypass—classic yet critical issues. These are typical examples of “working but flawed” code that may not be detected during testing. While the functionality may pass checks, attackers can easily exploit these vulnerabilities.
For SMEs, the damage caused by a security incident is incomparable to that faced by large corporations. The repercussions can include compensation for personal data breaches, loss of trust from business partners, and in the worst-case scenario, business shutdowns. The risk of incurring losses in the tens of millions of yen as a result of saving a few hundred thousand yen in development costs is very real.
With reduced costs, there is always a “quality chasm.” Jumping into this chasm without recognizing it is the most dangerous move.
The Illusion of Defense Through Prompts
“What if we just prompt the AI to ‘write secure code’?”—This thought is natural but naive.
Research known as “The Defense Trilemma” has demonstrated the fundamental limitations of prompt injection defenses. According to this study, it is mathematically impossible to simultaneously satisfy continuity (consistent defense), utility (maintaining functionality), and integrity (blocking all attacks).
In other words, even if you ask the AI to “write securely,” it will inevitably leave some vulnerabilities due to the trade-offs between safety and functionality. Moreover, the exact locations of these vulnerabilities may not even be known to the person who wrote the prompt.
This is not due to “AI being immature”; it is a structural limitation. Even as models evolve, this stalemate will not be resolved.
Code Review AIs Function at Only 40%
“Then, why not have another AI review the code written by the first AI?”—This has also been tested.
The results from the “Code Review Agent Benchmark” are as follows: Current code review agents can only correctly process about 40% of tasks. The remaining 60% are either overlooked or incorrectly flagged.
A 40% success rate is better than nothing, but it is far from a level that can be considered safe. When an AI writes code and another AI reviews it, both have a 60% chance of missing vulnerabilities. When multiplied, the probability of vulnerabilities slipping through is alarmingly high.
So, What Should SMEs Do?
I am not suggesting that SMEs should avoid using AI coding tools. On the contrary, it is essential to leverage tools that dramatically reduce costs. However, it is crucial to insert a verification step between “creating” and “using.”
Here are three minimum actions that SMEs should take:
1. Always Run Static Analysis Tools
Before deploying AI-generated code into production, run it through static analysis tools (like SonarQube, Semgrep, Bandit, etc.). Many of these tools are free to use, and the implementation cost is virtually zero. This alone can detect most classical vulnerabilities such as SQL injection and XSS.
For practical operation, integrate static analysis into the CI/CD pipeline on GitHub. By automating scans every time code is pushed, no additional human resources are required. It will take 1-2 hours to set up, and then it runs automatically.
2. Label Code as ‘AI-Generated’
Make sure to distinguish between AI-generated code and human-written code. Simple methods like tagging commit messages with “AI-generated” or managing them in a dedicated branch work well.
Why is this important? If an issue arises, not knowing which parts are AI-generated can lead to an enormous amount of time spent on root cause analysis. With labels, you can prioritize checking the AI-generated sections first. This is a way to minimize operational costs to zero.
3. Have Humans Review Authentication, Payment, and Personal Data Handling
It is not necessary to have humans review all code. This is not realistic for resource-limited SMEs. However, authentication, payment, and personal data handling—these three areas must always be reviewed by humans.
The potential damage from vulnerabilities in these areas is exponentially greater. The vulnerabilities in AI-generated code are most critical in these domains. By ensuring that humans check just these areas, you can cover 80% of the risk.
The ‘True Competition’ Beyond Reduced Costs
The essence of AI coding is the “democratization of development costs.” Development that was previously only possible for large corporations is now accessible to SMEs. This is undoubtedly an opportunity.
However, when everyone can write code with AI, the differentiator will not be who can write code, but who can run it safely.
The implementation cost for static analysis tools is zero. The time spent on label management is just five minutes a day. Establishing a review system for critical areas only requires formalizing rules. All of these can be done at zero monthly cost.
Companies that deploy AI-written code directly into production and those that incorporate these three steps will have vastly different probabilities of experiencing security incidents six months down the line.
In an era where “can create” is commonplace, the real value lies in having a system that can be used safely.
JA
EN