The Era of AI Moving the Mouse on Its Own: The Shocking Automation Cost of Personalized Tasks Drops from 150,000 to 30,000 Yen Per Month
Related Articles
Conclusion First: “Screen Operations” Have Been Handed Over to AI
Until now, to let AI handle tasks, it was necessary to connect APIs, write code, and build systems. Now, we have entered a stage where AI can move the mouse, look at the screen, and click just like a human.
Google DeepMind’s “AI Mouse Pointer,” Anthropic’s “Computer Usage API,” and the “Interactive AI” led by Mira Murati (former CTO of OpenAI) are all heading in the same direction, with major players set to launch between late 2024 and 2025.
“AI operating the screen on its own” — this is the crux of the matter.
So, what does this mean for small and medium-sized enterprises (SMEs)? The automation cost of personalized screen operations could potentially drop from 150,000 yen to 30,000 yen per month. The era of spending millions on RPA is quietly coming to an end.
—
Three Technologies: What’s Different?
First, let’s clarify. While the three approaches may seem similar, their objectives differ.
Google DeepMind’s “AI Mouse Pointer”
Based on Google’s Gemini, this concept involves embedding AI directly into the mouse pointer. When a user points at something, the AI understands the target and reads the context. Demonstrations have showcased image editing and spot searches on maps.
The key point is “browser integration.” If integrated into Chrome or Chromebooks, it can be used without additional software. Companies already in Google’s ecosystem will benefit the most.
Objective: AI assists by “seeing and understanding” operations on the browser.
Anthropic’s “Computer Usage API”
This feature, included in Anthropic’s Claude 3.5 Sonnet, allows AI to operate the entire desktop. It can take screenshots, recognize the state of the screen, move the mouse cursor, and perform keyboard inputs.
This is not limited to browsers. Desktop applications, business systems, and legacy software — anything with a screen can be operated. This is crucial for SMEs, as there is no need to API-enable old business systems. Simply let the AI operate the screen as it is.
Objective: AI executes all desktop operations on behalf of humans.
Thinking Machines Lab’s “Interactive AI”
Mira Murati proposes going beyond the traditional “human gives instructions → AI responds” turn-based model. The AI observes the user’s screen operations in real-time and suggests and executes the next action before being asked.
Although still in the research phase, the direction is clear. This lays the groundwork for an AI that can operate without explicit instructions.
Objective: Real-time collaborative AI that operates without instructions.
—
What Changes for SMEs: The Cost Structure is Disrupted
Now, let’s get to the main topic.
In SMEs, there are numerous “screen operations that only this person can do”.
- Manually entering monthly invoices into accounting software
- Transferring order data from Excel to core systems
- Updating inventory counts across multiple EC mall management screens
- Feeding attendance data into payroll software
Each of these tasks involves “looking at the screen, clicking, and inputting”. And all of them are personalized to specific individuals.
Let’s compare the previous automation methods and their cost perceptions.
| Method | Initial Cost | Monthly Running Cost | Challenges |
|---|---|---|---|
| RPA (UiPath, etc.) | 1,000,000 to 5,000,000 yen | 100,000 to 300,000 yen | Breaks with screen changes. Requires specialized personnel for maintenance |
| Requesting system integration from SIer | 3,000,000 to 10,000,000 yen | 50,000 to 200,000 yen | Long development periods. Vulnerable to specification changes |
| Part-time workers | 0 yen | 150,000 to 250,000 yen | Personalized. Stops if they take a day off |
| Screen Operation AI (from 2025) | 0 to 50,000 yen | 20,000 to 50,000 yen | Accuracy is still developing. Monitoring is necessary |
The difference is staggering.
The world where 3 million yen was spent on RPA implementation, and costs were incurred every time the screen layout changed, could potentially be replaced with a monthly API usage fee of 30,000 yen.
When estimating the specific cost structure, it looks like this:
- AI API usage fee (Anthropic Claude, etc.): 10,000 to 20,000 yen per month (based on usage; for about 100 invoices per month, it would be under 10,000 yen)
- Execution environment (cloud PC, etc.): 5,000 to 10,000 yen per month (a virtual desktop for AI to operate)
- Human cost for monitoring and error handling: equivalent to 5,000 to 10,000 yen per month (complete neglect is still premature; a weekly check is advisable)
Total: 20,000 to 40,000 yen per month. Taking the median gives us 30,000 yen.
Tasks that previously cost 150,000 yen per month will now cost 30,000 yen. This results in an annual saving of 1,440,000 yen. Moreover, the issue of personalization is resolved. Even if the responsible person leaves, the AI settings remain.
—
“So, Can It Be Used Right Now?” — To Be Honest
It would be irresponsible to only raise expectations, so let’s also mention the current limitations.
Anthropic’s Computer Usage API can be tried right now. However, it is in “beta”. The accuracy is around 70-80% based on experience. It clicks the wrong button 2-3 times out of 10. The failure rate increases with complex screen transitions or systems with many pop-ups.
Google DeepMind’s AI Mouse Pointer is still in the research announcement phase. It is uncertain when integration with Chrome will happen. However, considering Google’s development speed, there is a high likelihood it will be implemented in some form by 2025.
Thinking Machines Lab’s Interactive AI is even further down the line. While the concept is the most ambitious, it will likely not be usable as a product until at least 2026.
In other words, the only option available for experimentation starting today is Anthropic’s API. And this fact of being able to “experiment” is crucial.
—
What SMEs Should Do Now: Three Steps
1. Choose One Personalized Task
Don’t try to do everything at once. First, select one task that “stops when this person is absent.” Recommended tasks are invoice input or inventory updates for EC malls. Choose tasks that are routine and where mistakes can be easily rectified.
2. Test Small with Anthropic’s API
Registration for API usage is free. Since it’s pay-per-use, processing just 10 invoices will cost only a few hundred yen. Experience the act of “having AI operate the screen” first-hand. Seeing is believing, but “one experiment is worth a hundred articles.”
3. Assess Accuracy and Decide Whether to Implement or Wait
If the accuracy exceeds 90%, it can be implemented in production with human checks. If it’s below 80%, wait three months and try again. The pace of evolution in this field is extraordinarily fast. Things that couldn’t be done six months ago are now possible.
—
The Fundamental Question: Will the Job of “Operating” Disappear?
Finally, let’s discuss a broader topic.
Previous business automation focused on the idea of “connecting systems.” APIs were used for integration, and databases were manipulated directly. This required specialized knowledge and incurred high costs.
Screen operation AI flips this premise on its head. There is no need to modify systems. As long as there is a screen, AI can operate just like a human.
This presents a significant opportunity for SMEs to close the gap with large corporations. Automation achieved through a screen operation AI costing 30,000 yen per month can match the system integrations that large enterprises have built at the cost of tens of millions of yen. If the results are the same, the cheaper option wins.
The job of “operating the screen” will not be a human task in the not-so-distant future. To prepare for that time, starting small experiments now is the most rational move for local SMEs.
With 30,000 yen, personalization disappears. I hope you will first confirm this fact with your own hands.
JA
EN