The Rapid Evolution of LLM Long-Form Context Technology: A New Era Where Small and Medium Enterprises Can Manage Their ‘Mountains of Paper’ for Just 50,000 Yen a Month
Related Articles
Conclusion
Let’s get straight to the point: The cost of processing the “mountains of paper” is about to change dramatically.
In small and medium enterprises, there exists a “mountain of paper”—meeting minutes, contracts, operational manuals, and past reports. Someone reads, someone summarizes, and someone checks. The cost of this “someone” is now poised to drop dramatically.
The catalyst for this change is the rapid evolution of long-form context processing technology in LLMs (Large Language Models). Until now, the AI’s limit for reading at one time was a few thousand tokens (approximately several thousand characters in Japanese), but this has now expanded to 100,000 tokens, or about 70,000 to 80,000 characters in Japanese—approaching the length of a full book. Moreover, the cost of operating this technology has plummeted.
What does this mean in concrete terms? What has changed, and what should small and medium enterprises do? Let’s break down the technology and how it can be applied in practice.
—
What Happened: Breaking Through the KV Cache Problem
When LLMs process long texts, the biggest bottleneck is a memory area known as the “KV cache.” As the input length increases, this cache consumes GPU memory. When trying to process a document of 100,000 tokens, the KV cache uses more memory than the model itself—this reversal phenomenon was occurring.
In other words, while “AI that can read long documents” technically existed, it required expensive GPUs to operate, making it inaccessible for small and medium enterprises.
The breakthrough came with OSCAR, an open-source project by Together AI. OSCAR quantizes the KV cache down to 2-bit precision, compressing memory usage by up to 1/8 compared to the traditional 16-bit precision. Furthermore, it is open-source and free for anyone to use.
Another noteworthy advancement is the FP4 Attention Technology. This is based on the idea that “there’s no need to compute everything at high precision.” It calculates only the important parts of the document (where the interaction between queries and keys is significant) in FP16 (16-bit floating point), while processing the remaining 95% in FP4 (4-bit). The quality is nearly equivalent to FP16, but processing speed is significantly improved.
Additionally, KV cache compression technologies such as Adaptive Mass-Segmented KV Compression and VECTOR have emerged. By combining these technologies, the GPU specifications required for long-form processing have decreased dramatically.
In short, the cost required to have AI read a document equivalent to a full book has dropped by orders of magnitude at the hardware level. This is what is happening now.
—
A Concrete Look at Costs: What Changes and By How Much
Discussing technology alone is meaningless. What small and medium enterprise owners want to know is, “So, how much will it cost?”
Let’s clarify the current situation.
Traditional Document Processing Costs (Manual):
- Reviewing one contract: Internal staff takes 30-60 minutes. If outsourced to an external lawyer, it costs 30,000 to 50,000 yen per document.
- Creating meeting minutes: For a one-hour meeting, it takes 30-45 minutes to create. If there are 20 meetings a month, that amounts to 10-15 hours.
- Updating and searching operational manuals: It becomes personalized to the point where it’s like, “I can’t figure this out without asking that person.”
Outsourcing the processing of 50 contracts a month would cost between 1.5 to 2.5 million yen. Even handling it internally, if you convert the staff’s labor cost to hourly wages, it would amount to 25-50 hours a month, equating to 100,000 to 200,000 yen in labor costs.
When Utilizing LLM Long-Form Processing:
- Using long-form context-compatible APIs like Claude or GPT-4o, processing 100,000 tokens costs only a few dozen to a few hundred yen per instance.
- Even if you process 50 contracts a month, the API costs would only be a few thousand to tens of thousands of yen.
- If you run open-source technologies like OSCAR locally, the ongoing costs would be nearly zero, excluding the initial GPU setup costs.
Even with conservative estimates, it has become realistic to keep costs under 50,000 yen a month with API usage and simple workflow construction. Compared to traditional manual costs, this is a reduction to one-fifth or one-tenth.
Moreover, it’s important to note that it’s not just about cost. Speed is changing. An AI can read a contract in seconds, extracting key points and highlighting risk areas, while meeting minutes can be completed simultaneously as the meeting ends. Manuals will shift from “searching and reading” to “asking questions and getting answers.”
—
What Really Changes for Small and Medium Enterprises
Now we get to the main point. The reduction in costs is merely a means to an end. We want to consider what happens after costs go down.
1. Breaking Down Personalization
The biggest issue in small and medium enterprises is personalization. “Only A knows the background of that contract,” or “B has that manual in their head.” This state will be structurally resolved by long-form context LLMs.
If you feed all past contracts, meeting minutes, and email exchanges into the LLM, you can simply ask, “What were the changes in contract terms with this client?” and get an answer. Even if A leaves the company, the knowledge remains. This is not just efficiency; it’s a structural change where the company’s knowledge assets shift from individuals to the organization.
2. The Value of “Reading Work” Will Plummet
Reading contracts, summarizing meeting minutes, and finding relevant sections in manuals are all forms of “reading work.” Long-form context LLMs will bring the cost of this “reading work” down to nearly zero.
This is not a threat but an opportunity. In small and medium enterprises, talented individuals are often bogged down by “reading work.” If that time is freed up, they can focus on customer service, product development, and sales—essentially, the work that generates revenue. While large companies can divide tasks into specialized departments, individuals in small and medium enterprises often wear multiple hats. Therefore, the impact of automating “reading work” is even greater in small and medium enterprises.
3. The Information Gap with Large Enterprises Will Narrow
Large companies have legal departments, corporate planning departments, and dedicated teams for knowledge management. Small and medium enterprises do not. This gap could be rapidly narrowed with long-form context LLMs.
For just 50,000 yen a month, you can gain the primary check function of a legal department, a knowledge management system, and the automation of meeting minutes. The “minimum functions” of systems that large companies have built at the cost of tens of millions of yen will become accessible to small and medium enterprises. This is a story where areas that have been dismissed as “just the way it is for small businesses” can be overturned by technology.
—
So, Where Should We Start?
Waiting for technological evolution won’t help. Here are three things you can start doing today.
1. First, Digitize Your Paper Documents
To benefit from long-form context LLMs, documents need to be in digital format. If you’re still storing them on paper, start by scanning and using OCR to digitize them. This may seem mundane, but it’s the most crucial step.
2. Experiment with Existing APIs to “Let Them Read”
APIs that support long-form contexts, like Claude, GPT-4o, and Gemini, are already available. Simply have them read a contract and instruct them to “extract risk areas.” You’ll be amazed at the accuracy. With just a few thousand yen for experimentation, you’ll see how much of your document processing can be automated.
3. List Tasks That Require Asking Others
Identify tasks that are personalized. If you feed the documents related to those tasks (emails, meeting minutes, manuals) into the LLM, the resolution of personalization will begin. You don’t need to do everything at once; starting with one task is sufficient.
—
Future Points of Interest: Accelerating Open Source and Price Competition
The open-sourcing of OSCAR is symbolic. With the core technology for long-form context processing now open, numerous application tools based on this technology will emerge. Improvements by the community will also progress.
At the same time, price competition among API providers is intensifying. Long-form processing that cost hundreds of yen per instance a year ago now costs just a few dozen yen. This trend will not stop. The longer you wait, the cheaper it will become, but companies that start early will accumulate an invisible asset of “know-how on usage.”
Considering the speed of technological evolution, there’s a high likelihood that the current norms will change again in six months. That’s why it’s crucial to start small experiments now and get a feel for how much you can utilize it in your document processing.
For just 50,000 yen a month, you can clear your “mountains of paper.” The technical backing for this is already in place. The only question left is whether to try it.
JA
EN