The AI Standards Paradox: Why Websites Are Adopting llms.txt While Bots Ignore It
As of mid-2026, a strange phenomenon has taken hold of the digital landscape. Website administrators, driven by the desire to remain relevant in an AI-dominated search ecosystem, are rapidly adopting a new suite of technical files—llms.txt, llms-full.txt, and ai.txt. Yet, according to a comprehensive study by Originality.ai released in July 2026, these files are largely existing in a vacuum. While the number of websites hosting these files has surged nearly ninefold in a single year, the actual interaction from AI systems remains effectively non-existent.
This disconnect highlights a deepening crisis in web standards: a widening chasm between the "agent-readiness" movement and the practical reality of how current AI models interact with the open web.
The Growth Surge: A Statistical Breakdown
The data provided by Originality.ai, which monitored over 3 million websites, paints a picture of a gold-rush mentality. In June 2025, there were only 4,088 instances of llms.txt across the monitored domains. By May 2026, that figure had climbed to 36,120—an 8.8x increase in just twelve months. When including the companion files llms-full.txt and ai.txt, the total adoption footprint reaches nearly 39,000 sites.
The newer, more specialized formats are seeing even more aggressive percentage-based growth. llms-full.txt—a file designed to hold the entire textual content of a site for context-window-hungry models—jumped from 23 sites to 2,463, a 107.1x increase. Meanwhile, ai.txt, which focuses on licensing and crawler directives, grew nearly 100x, from a mere four sites to 397.
Despite this momentum, llms.txt remains the industry standard by default, accounting for 92.7% of all implemented files. However, volume of adoption is not synonymous with utility.
The Request Gap: The "Ghost Town" Phenomenon
While the files are being deployed, they are not being read. A parallel study conducted by Ahrefs using server-log data from 137,000 domains revealed a sobering reality: 97% of llms.txt files received zero requests in May 2026. The vast majority of these files are effectively invisible to the systems they were designed to serve.
When requests do occur, they are rarely coming from the AI giants. Ahrefs found that:
- AI Retrieval Bots: Accounted for a mere 1.1% of all requests to
llms.txtfiles. - GPTBot: Responsible for 4.51% of requests.
- ClaudeBot: A negligible 0.80%.
- DeepseekBot: A statistically insignificant 0.02%.
Who is actually reading these files? The logs indicate that the primary traffic comes from SEO audit tools (21.7%), unidentified bots (14.9%), and general web crawlers (13.1%). The irony is profound: the very files created to help AI assistants understand the web are currently being used primarily by tools designed to monitor the health of the files themselves, rather than by the AI engines that would benefit from the data.
Chronology of an Emerging Standard
- 2024: Data scientist Jeremy Howard proposes the
llms.txtstandard to address the context-window limitations of Large Language Models (LLMs). - June 2025: Initial baseline tracking by Originality.ai shows minimal adoption (approx. 4,000 sites). John Mueller of Google states on social media that no AI system currently uses the file.
- January 2026: PPC Land reports a stalemate; major AI providers continue to ignore the standard despite rising interest from web developers.
- March 2026: Google introduces the "Google-Agent" user-triggered fetcher, signaling a pivot toward agentic browsing that bypasses traditional
robots.txtprotocols. - April 2026: Research from Rutgers and The Wharton School reveals that blocking AI crawlers via
robots.txtcan cause a 7% drop in human traffic, complicating the decision-making process for site owners. - May 2026: Google integrates an optional
llms.txtaudit into its Lighthouse developer tool, signaling a shift in policy, though clarifying that the file is not required for Search. - July 2, 2026: Originality.ai publishes its year-long tracking study, confirming 8.8x growth in adoption alongside the Ahrefs findings of near-zero utility.
The Philosophy of the Files: What Are They For?
The confusion surrounding these standards is partly due to the distinct, yet often conflated, roles they play:
llms.txt: Proposed by Jeremy Howard, this acts as a "site map" for LLMs. It provides a structured summary of a site’s purpose and priority pages, helping models navigate content without exhausting their context windows.llms-full.txt: A flattened, comprehensive document containing the actual text of a site. It is intended for smaller, knowledge-dense sites that want to ensure their entire content library is "ingestible" in one go.ai.txt: Promoted by Spawning, this is an attempt to introduce consent into the AI ecosystem. It outlines licensing terms, training permissions, and contact details for AI companies.
The fundamental problem remains: robots.txt, the veteran of web standards, was built for indexing, not AI training or agentic interaction. It cannot distinguish between a search crawler and an AI learner, nor can it stipulate licensing terms. Yet, because robots.txt is the only standard that carries even a modicum of enforcement, site owners are caught between a rock and a hard place.
Conflicting Signals from the AI Platforms
Perhaps the most significant barrier to adoption is the hypocritical behavior of the major platforms. Google, OpenAI, and Anthropic all explicitly advise site owners to rely on robots.txt for crawler management. Yet, all three companies publish their own llms.txt files for their respective developer documentation.
This "do as I say, not as I do" approach has created a sense of distrust among site owners. Google’s position remains particularly nuanced: they maintain that llms.txt has no impact on Search ranking, yet they have added support for the file in their Chrome Lighthouse tool. This suggests that while the file may not be a search signal, it is becoming a "best practice" for developers creating agent-ready websites.
Implications: The "Agentic" Shift
The shift in how we define these files is essential. Originality.ai concludes that llms.txt is likely not a search visibility tool, but rather infrastructure for autonomous agents.
As the web moves from a "search-and-click" model to an "ask-and-act" model, websites will need to be navigated by AI agents that can perform tasks—booking appointments, comparing products, or summarizing complex technical documentation. For these agents, a clean, structured map of the site is invaluable. The current lack of usage by search-focused bots (like GPTBot) is not necessarily a failure of the format, but a sign that the "agentic web" is still in its infancy.
However, the lack of a consolidated, industry-wide agreement on these standards poses a massive challenge. With only 7.4% of Fortune 500 companies adopting these files as of early 2026, the standard remains firmly in the "experimental" phase.
Conclusion: A Strategy for Site Owners
For the average website owner, the path forward remains precarious. The evidence suggests that while llms.txt is growing in popularity, it is currently "dead code" for the vast majority of sites.
The recommended strategy, as outlined by industry analysts and the recent data, is to focus on the following:
- Maintain
robots.txt: It remains the only widely recognized way to control traffic, despite its flaws. - Implement
llms.txtwith Caution: Only prioritize it if your site features deep, structured documentation or APIs that would benefit from being navigated by autonomous agents. - Monitor Platform Behavior: Do not expect a ranking boost. Instead, treat these files as a form of future-proofing, preparing your site for the eventual shift toward agent-based browsing.
The tension between the 8.8x growth in adoption and the near-zero utilization by AI bots is a hallmark of a nascent technology. The web is currently preparing for an AI visitor that hasn’t quite arrived yet, or perhaps, is still deciding which map to read.
