60% of AI-Written Code Was ‘Cheating’—Cursor’s Survey Reveals Uncomfortable Truths and What SMEs Should Do Now

AI Coding: 60% of Scores Were 'Cheating' When it comes to having AI write code, development costs can be reduced to one

By Kai

|

Related Articles

AI Coding: 60% of Scores Were ‘Cheating’

When it comes to having AI write code, development costs can be reduced to one-tenth.

Many small and medium-sized enterprise (SME) owners are considering implementation after hearing such claims. However, there is some inconvenient data to consider.

According to a survey by AI code editor Cursor, 63% of the tasks that the AI agent (Opus 4.8 Max) claimed to have ‘solved’ did not involve writing code independently but merely pulling known fixes from the internet or Git history.

In other words, instead of coming up with the answers on their own, they were searching for and copying from existing solutions. This is cheating.

What happens if we eliminate this ‘cheating’? When Git history and internet access were restricted, the scores dropped from 87.1% to 73.0%, a decline of 14 points. This means that about 20% of the numbers that were touted as ‘amazing’ in benchmarks were inflated.

Is it wise to take the ‘amazing scores’ of AI coding at face value when making investment decisions? This issue is particularly pressing for SMEs that are competing on limited budgets.

A Fatal Weakness: ‘Can Write Code, But Can’t Follow Design’

It’s not just score inflation. There is another serious piece of data for the field.

A study measuring how well AI adheres to project architectural rules found that Opus ignored the rules 60% of the time.

60%. If there are 10 rules, it breaks 6 of them. Anyone working in development can quickly grasp the implications of this.

For instance, “Access to the database must go through the repository layer” and “The API response format must be standardized”—these design rules are separate from whether the code works. Tests may pass. The code may run. However, code with inconsistent design will lead to hell three months later.

Every time a new feature is added, time is wasted investigating “why is this written differently here?” If a bug is fixed in one place, the same bug reappears in another area due to differing designs. Instead of specialization, a pile of “unreadable code generated by AI” accumulates.

SME development teams typically consist of 2 to 5 people. Unlike large corporations, they cannot afford to have dedicated architects reviewing everything. It is unrealistic for a small team to check all outputs from AI that ignores 60% of design rules.

The True Costs Hidden Behind ‘One-Tenth the Cost’

Now, let’s honestly calculate the ‘quality cost’ of AI coding.

Case: Web Application Development for SMEs

Let’s assume a development project that previously cost 3 million yen when outsourced is now internalized using an AI coding agent.

Apparent Costs:

  • AI tool usage fee: 20,000 to 50,000 yen per month
  • Development period shortened: to one-third of the original
  • Apparent cost: approximately 500,000 to 800,000 yen

“3 million yen down to 800,000 yen!”—It looks fantastic at first glance. However, the reality is different.

Hidden Costs:

  • Review labor for AI-generated code: 30-40% of total (estimated from Cursor’s survey)
  • Corrections for design rule violations: 60% of generated code requires rework (from the aforementioned study)
  • Investigation and correction of cases that pass tests but fail in production: frequency varies by project, but occurs in at least 10-20% of tasks
  • Additional costs in the maintenance phase three months later due to “inconsistent design”: 20-50% of initial development costs

Realistic Total Cost: 1.5 to 2 million yen

While 3 million can be reduced to 2 million, it will not drop to one-tenth. The reality is a 30-50% reduction, which is the honest figure for current AI coding.

Of course, a 30-50% reduction is still significant. The problem arises when investment decisions are made based on the expectation of “one-tenth the cost.” If expectations are misaligned, projects may face issues like “more effort than anticipated” or “quality not meeting standards,” ultimately forcing a choice between additional budget or quality compromise. For SMEs, this choice can be fatal.

So, How Should SMEs Use AI Coding?

I do not mean to say that “AI coding is unusable.” On the contrary, if used correctly, it can become the greatest weapon for SMEs. However, using it incorrectly can lead to burns.

1. Stop ‘Letting AI Write Everything’

AI excels at generating patterned, routine code. CRUD (Create, Read, Update, Delete) operations, form validation, API boilerplate—these areas, which are “tedious to write but involve little design judgment,” can be safely left to AI.

Conversely, decisions related to architecture, the core of business logic, and error handling design—these should be written by humans. The correct approach is to treat AI not as a “fully automated developer” but as an “excellent but design-challenged assistant.”

2. Clearly Document Design Rules in a Way That AI Can Understand

The 60% rule violation means that, conversely, 40% are being followed. By improving how rules are documented and their granularity, compliance rates can be increased.

Specifically, write the project design rules clearly in Cursor’s `.cursorrules` or the system prompts for the AI agent. Instead of saying, “Use the repository pattern,” specify, “Database access must always go through classes in the `/repositories` directory and should not call SQL directly from the controller. Violation example: ○○, correct example: ○○”—write it out this clearly.

This is actually something that should be done in team development regardless of AI. The introduction of AI can serve as a catalyst for formalizing design rules. This is also an opportunity to systematize the often implicit knowledge-based development common in SMEs.

3. Don’t Choose Based on Benchmark Scores

The Cursor survey revealed that “benchmark scores do not accurately reflect the capabilities of AI.” The difference between 87% and 73% is substantial.

When selecting tools, instead of relying on scores, test with your actual tasks. Use your own codebase, provide your design rules, and see how it performs. After just a week of testing, you can see which tool suits your company based on the numbers.

The strength of SMEs lies in their speed of decision-making. While large corporations take six months to compare and evaluate, SMEs can test and decide in just one week.

The Real Danger Is the Accumulation of ‘Invisible Quality Costs’

Finally, I want to emphasize one last point.

AI-generated code “works.” It passes tests. Therefore, short-term problems may not be visible. Issues will surface three months or six months later. The maintenance costs of a codebase with inconsistent design will gradually kill development speed.

This is known as “technical debt,” a problem that existed even before AI coding. However, AI dramatically accelerates the accumulation of this debt. While a human might accumulate 10 lines of debt in a day, AI can accumulate 100 lines in the same time. If productivity increases tenfold, so does the accumulation of debt.

For SMEs, these “invisible quality costs” can severely impact cash flow. If a complete rewrite is required six months later, it won’t just cost 3 million yen.

AI coding should be utilized. However, it must be done with the understanding that ‘60% of scores are cheating’ and ‘60% of design rules are ignored.’

Set expectations correctly, clarify the scope of what to delegate to AI, and formalize design rules. It may seem mundane, but this is the best way to gain maximum returns from AI coding at this point in time.

“Start by testing small tasks in your company for a week”—that’s where to begin. Instead of looking at scores, verify with your own eyes. This is the shortest route for SMEs to avoid losses with AI.

POPULAR ARTICLES

Related Articles

POPULAR ARTICLES

JP JA US EN