Can We ‘Audit’ AI Agents in Spreadsheets?—A More Complicated Issue Remains After the Elimination of Personalization

"Who Did It" Has Disappeared. So, Who Explains "Why It Happened"? AI agents are beginning to infiltrate Excel and Googl

By Kai

April 25, 2026 | Last updated April 25, 2026

April 14, 2026

AI Agents Autonomously Built Tax Filing Software—A Structural Change That Could Eliminate the “1 Million Yen Per Year” for Professionals Has Begun

April 9, 2026

Preventing the Shock of Unexpected Costs: AI Cost Prediction Tool ‘Flowcost’ and the Reality of 5x Differences in Output Token Costs

“Who Did It” Has Disappeared. So, Who Explains “Why It Happened”?

AI agents are beginning to infiltrate Excel and Google Sheets. Automatic formula generation, data cleansing, and aggregation logic construction—tasks that were once known only to “Mr. Tanaka in accounting” are now being replaced by AI.

The elimination of personalization is a good thing. However, I would like to pose a question here.

“AI did it instead of Mr. Tanaka. But who can explain why the AI chose that formula?”

Personalization has vanished. Yet accountability remains. In fact, a more troublesome issue arises: the explosive increase in tracking costs. A study published on arXiv titled “Behavioral Auditing and Control of AI Agents in Spreadsheets” directly addresses this problem. For small and medium-sized enterprises in rural areas, this is not just someone else’s issue.

The Fact Revealed by the Research: AI Agent Decisions Become a “Black Box”

The core of this research is simple. When AI agents perform complex tasks on spreadsheets, users can hardly trace the decision-making process of those agents, as demonstrated through experiments.

Specifically, here’s what happens. You instruct the AI agent to “aggregate sales data.” The agent constructs formulas, applies filters, and creates pivot tables. The results are produced. However, the reasons behind the chosen aggregation method, the exclusion of specific data, and the underlying assumptions remain invisible.

If it were Mr. Tanaka, he would say, “Oh, that’s calculated excluding tax.” The AI agent won’t answer unless asked, and even then, it may not provide an accurate response.

The system developed by the research team, called “Pista,” offers one solution to this problem. Pista breaks down the execution process of AI agents into auditable individual actions. By allowing users to verify and intervene at each step, it makes it possible to trace “why it happened.”

User tests showed that when using Pista, understanding of the agent’s decisions improved by about 40% compared to traditional methods. Conversely, without such a mechanism, nearly 60% of AI decisions would pass through without clear understanding.

The Paradox of “AI Personalization”—What Was Meant to Be Systematized Now Creates Another Dependency

Now we get to the main point.

AI was introduced to eliminate personalization. However, there are cases where the AI agent itself becomes optimized for a “specific usage.” The research reports a phenomenon called “behavioral transfer.”

What does this mean? The AI agent learns the past operation patterns of a specific user and processes data based on that user’s unique assumptions. For example, if an accountant always manually adjusted the consumption tax, the agent would inherit that “habit.” Even if the person in charge changes, the agent continues to operate based on the predecessor’s habits.

This is merely a replacement of “human personalization” with “AI personalization.” Moreover, human personalization could be resolved by simply asking Mr. Tanaka. AI personalization may lead to situations where even the AI itself cannot clearly explain why it is acting in a certain way.

Consider this in the context of a small business. In a company with ten employees, the accounting spreadsheet processing is entrusted to an AI agent. Six months later, the person in charge leaves. The new person looks at the agent’s output and asks, “Why are these numbers like this?” No one can answer. You can’t even ask the AI. This is the reality waiting after the disappearance of personalization.

Let’s Talk About Costs—Auditing Is Not a “Luxury Item”

“I understand that an auditing mechanism is necessary. But how much does it cost?”

To be honest, at this point, it is not realistic for small and medium-sized enterprises to independently build auditing systems like Pista. It is still a prototype in the research phase, and commercialization is still a way off.

However, the cost structure is becoming clearer.

The utilization cost of AI agents continues to decrease. Automated spreadsheet processing using GPT-4 class APIs operates for a monthly fee of several thousand to tens of thousands of yen. The world has already arrived where data organization tasks that previously cost 50,000 to 150,000 yen per month for external accountants or consultants can now be done for 5,000 yen.

The question is whether there is a mindset to allocate some of the savings to “auditing.”

Many small and medium-sized enterprises currently stop at “AI has made things cheaper, lucky me.” However, if operations are conducted without auditing, one day, when asked by the tax office or business partners, “What is the basis for these financial figures?” they will not be able to answer. The risk of that is incomparable to the savings of 5,000 yen.

What specifically should be done? The three realistic measures that small and medium-sized enterprises can take at this point are:

1. Set up to retain operation logs of AI agents
In Google Sheets, change history is automatically retained. However, to record “who instructed it and what prompt was executed” for changes made through the AI agent, log output settings on the agent side are necessary. This can be done at almost zero cost. Most companies are not doing this, but it should be started today.

2. Save “prompt + output” pairs monthly
What was instructed to the AI and what was outputted. Simply keeping this pair on record will make it significantly easier to trace “why it happened” later. Just create a monthly folder in Google Drive and drop them in. The cost is zero. The effort is about 30 minutes a month.

3. Have a human sample-check the AI’s output quarterly
There is no need to audit everything. Once a quarter, have a human manually verify the major aggregation results. This alone can lead to early detection of whether the AI is learning incorrectly. The required time is about half a day. Even if you outsource it, it would cost 20,000 to 30,000 yen.

Even when combined, the annual cost is under 100,000 yen. This is less than 10% of the annual cost savings (500,000 to 1,500,000 yen) achieved by using AI agents.

The Real Question Is Not Whether to Use AI, but Whether You Can Take Responsibility for AI’s Decisions

What this research confronts us with is not a technical issue, but a responsibility issue.

AI agents operate spreadsheets. Business decisions are made based on the output numbers. These numbers are included in loan applications and provided as estimates to business partners. Who will explain the basis for those numbers, and how?

Large corporations have information systems departments, auditing firms, and compliance structures. Small and medium-sized enterprises do not have these. Therefore, the only option is to ensure it through systems.

Conversely, there is an opportunity here. Small and medium-sized enterprises that can systematize the auditing of AI agents will gain increased trust from business partners and financial institutions. Just being able to say, “We use AI, but we also have an auditing system” differentiates them from others. There is a possibility that small and medium-sized enterprises can achieve an auditing system that large corporations spend millions of yen to build, for less than 10,000 yen a month. This is the agility that only small and medium-sized enterprises can have.

Conclusion: Prevent the Next “Personalization of Accountability” After the Elimination of Personalization

AI agents eliminate personalization. This is certain to happen. However, beyond that, a new trap called “personalization of accountability” awaits. “I don’t really understand AI, but it’s working, so it’s fine”—this is structurally the same as the old “It’s okay because I’m leaving it to Mr. Tanaka.”

The difference is that while Mr. Tanaka would answer if asked, the AI may not answer even if you ask.

What can be done starting today is clear. Keep logs. Save prompts. Check calculations quarterly. With an annual investment of under 100,000 yen, you can become a company that can explain “why it happened.”

Whether to use AI is no longer the question. The question is whether you can create a system that takes responsibility for AI’s decisions. That will be the dividing line for small and medium-sized enterprises moving forward.

TOPICS

WORLD INSIGHT

Can We ‘Audit’ AI Agents in Spreadsheets?—A More Complicated Issue Remains After the Elimination of Personalization

“Who Did It” Has Disappeared. So, Who Explains “Why It Happened”?

The Fact Revealed by the Research: AI Agent Decisions Become a “Black Box”

The Paradox of “AI Personalization”—What Was Meant to Be Systematized Now Creates Another Dependency

Let’s Talk About Costs—Auditing Is Not a “Luxury Item”

The Real Question Is Not Whether to Use AI, but Whether You Can Take Responsibility for AI’s Decisions

Conclusion: Prevent the Next “Personalization of Accountability” After the Elimination of Personalization

POPULAR ARTICLES

Fires from Waste Burn 15 Hectares, Ship Dismantling Creates Smoke in the City—What is Happening in a Region Lacking a System for ‘Destruction’

Persimmons Turned into Beer, Delaware Grapes Shipped, and a 440-Year-Old Vinegar Brewery: The ‘No Waste, Full Use’ Economic Zone of Onomichi

Married After Dating Zero-Days, She Became Pregnant While Coaching the Brazilian National Judo Team ― Interview with Yuko Fujii and Her Husband #2

World Insight Interview by Shunsuke Ochiai Vol.8 Rebecca, who is exploring the differences between Japanese and English culture.

Related Articles

Trusting AI Outputs Can Cost You Millions: Lessons for SMEs from a Law Firm’s Apology Incident

Japan’s Tech Revival: New Stars Shine at CES

A Massive 754B Parameter Model Has Emerged. But What Small and Medium Enterprises Really Need is ‘Small and Specialized’

The Break-Even Point for Local AI is 2.6 Years—From ‘Renting AI’ to ‘Owning AI,’ the Cost Structure of Small and Medium Enterprises is Flipped

POPULAR ARTICLES

Fires from Waste Burn 15 Hectares, Ship Dismantling Creates Smoke in the City—What is Happening in a Region Lacking a System for ‘Destruction’

Persimmons Turned into Beer, Delaware Grapes Shipped, and a 440-Year-Old Vinegar Brewery: The ‘No Waste, Full Use’ Economic Zone of Onomichi

Married After Dating Zero-Days, She Became Pregnant While Coaching the Brazilian National Judo Team ― Interview with Yuko Fujii and Her Husband #2

World Insight Interview by Shunsuke Ochiai Vol.8 Rebecca, who is exploring the differences between Japanese and English culture.

TOPICS

WORLD INSIGHT