“Free Code” Comes at a High Cost—The Reality Facing Small and Medium Enterprises Amid Large-Scale Open Source Contamination and the Quality Collapse of AI-Generated Code

How Much Are You Paying for Free Code? Open source code is "free." Having AI (like ChatGPT or Copilot) write code is al

By Kai

|

Related Articles

How Much Are You Paying for Free Code?

Open source code is “free.” Having AI (like ChatGPT or Copilot) write code is also nearly free.

So, let me ask: Was that free code really free?

How many hours did you spend on reviews? How many days were wasted when it didn’t work in production? How much would it cost if customer data leaked due to unnoticed security vulnerabilities?

Right now, two significant issues are occurring simultaneously behind the facade of “free code.” One is large-scale contamination of open source. The other is the quality collapse of AI-generated code. Both share the same structure: as the “procurement cost” of code approaches zero, the costs of verification, correction, and incidents explode.

The most affected are small and medium enterprises (SMEs), which typically have thin security personnel and code review systems.

3,800 Repositories Contaminated—The Reality of Open Source Contamination

According to a report from GitHub, a hacker group called “MUT-1244” (also referred to as Stargazers Ghost in some reports) accessed around 4,000 code repositories, confirming that at least 3,800 repositories were contaminated with malicious code.

The method is simple yet clever. They fork legitimate open source projects, creating numerous “fake repositories” that look almost identical but contain backdoors or malware. They fake stars (ratings) to appear at the top of search results. Developers unwittingly incorporate this code into their projects—this is how the infection occurs.

Consider the significance of the number 3,800. If one repository was used by an average of 10 companies, the impact would reach 38,000 companies. If it were 100 companies, that would be 380,000. Dependencies in open source are interconnected, so the actual damage could be even more extensive.

According to IBM’s “Cost of a Data Breach Report 2024,” the average cost of a data breach is $4.88 million (approximately 730 million yen). Even for SMEs, it’s not uncommon to see damages in the tens of millions of yen per incident. Ransomware payments, business interruptions, customer compensation, and reputational damage—these are far too high for damages that began with “free code.”

A Quarter of AI-Generated Code Creates New Quality Issues

The other significant issue is AI-generated code.

The number of companies using GitHub Copilot and ChatGPT to write code is rapidly increasing. In fact, development speed is improving. However, the obvious truth that speed and quality are a trade-off is beginning to be supported by data.

According to a 2024 survey by GitClear, it was confirmed that about 24% of code written with AI assistance causes new quality issues when analyzed with static analysis tools (like Pylint). That’s one in every four lines flagged as having “issues.”

Even more serious is that AI can produce “plausible lies.” In the legal field, there have been multiple incidents in the U.S. where lawyers cited AI-generated case law in court, only to find that those cases did not exist. In 2023, a lawyer in New York faced sanctions for including six fictitious cases generated by ChatGPT in court documents.

The same can happen with code. An AI-generated function may reference a non-existent library. It passes tests but quietly breaks under edge cases. These “plausible bugs” tend to be discovered later than human-written bugs because the code looks clean.

What’s the result? The review, correction, and incident response costs can far exceed the development time saved by AI generation. According to estimates from one development company, using AI-generated code without verification can lead to maintenance costs swelling to 1.5 to 2 times the usual amount.

Analyzing the Structure—The Trap of “Zero Procurement Cost”

Let’s take a step back and look at the structure.

Traditional Development Utilizing Open Source Utilizing AI-Generated Code
Code Procurement Cost High Almost Zero Almost Zero
Verification/Review Cost Fixed Necessary but often overlooked Necessary but often overlooked
Cost When Incidents Occur Fixed Increased due to contamination risk Increased due to quality issues
Total Cost Easy to Estimate Hidden Costs Expand Hidden Costs Expand

When procurement costs are zero, people tend to overlook verification costs. “It’s free, so let’s just use it”—this decision can come back to haunt you with tenfold costs later.

This is not a technical issue; it’s a cost structure issue.

SMEs Must Establish a “Code Procurement Policy”

Large companies have security teams. They have specialized code review teams. They have tools in place to manage SBOM (Software Bill of Materials).

SMEs do not have these resources. That’s why a policy is essential.

What specifically should be done? Just three decisions need to be made.

1. Always Record “Where the Code Came From”

For open source, note which repository and which version. For AI-generated code, record which tool and which prompt was used to generate it. This alone will drastically change the speed of identifying the cause when issues arise. The cost is zero. An Excel spreadsheet is sufficient.

2. Ensure “External Code” Passes Minimum Automated Checks Before Production Deployment

Static analysis tools (like ESLint, Pylint, Bandit) are available for free. If integrated into the CI/CD pipeline, they require no manual effort. A significant portion of the 24% quality issues can be caught this way. Setting it up takes half a day, and operational costs are nearly zero. There’s no reason not to do it.

3. Develop the Habit of Estimating the “Total Cost of Free Code”

When introducing an open source library, write down: “Procurement Cost: 0 yen, Verification Cost: X hours, Update Tracking Cost: Y hours per year, Security Monitoring Cost: Z hours per month.” The same applies to AI-generated code. Just doing this estimation will reduce the tendency to think, ‘It’s free, so let’s use it all.’

“Nothing Is More Expensive Than Free” Has Become a Reality in the World of Code

This old saying has never felt more real than it does in the world of software today.

The contamination of open source has been visualized by the number 3,800. The quality issues of AI-generated code have been quantified by the figure 24%. Both represent hidden costs lurking behind the facade of “zero procurement cost.”

For SMEs, code is shifting from being something to “write” to something to “procure.” The outsourcing has merely changed from humans to AI or open source. No company would use outsourced resources in production without quality control. The same goes for code.

What’s needed is not the same level of security investment as large corporations. Just three habits: “Record where the code came from,” “Pass automated checks,” and “Estimate total costs.” All of these have almost zero cost. It’s just a matter of whether to do it or not.

I’m not saying don’t use “free code.” Just, please calculate how much you are really paying for that free code.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN