Most developer conversations about AI agents focus on building the agent. But agents need somewhere to go. They need to crawl your site, read your docs, call your APIs, and understand what actions are available — and right now, most of the web is not ready for them.
Cloudflare published adoption data in April 2026 that puts this in stark relief: while 78% of sites have a robots.txt, only 4% declare AI usage preferences, and fewer than 15 sites on the entire internet implement MCP Server Cards. The infrastructure side of the agent ecosystem is roughly where SEO was in 2001.
This article is a practical guide for engineers who want to be ahead of that curve.
TL;DR
- “Agent readiness” means making your site or API easy for AI agents to discover, read, and use — it is different from building agents.
- There are four layers: Discoverability, Content Accessibility, Bot Access Control, and Protocol Discovery.
llms.txtand markdown content negotiation are the two highest-leverage changes you can make today.- Cloudflare’s own docs implementation achieved 31% fewer tokens consumed and 66% faster response times after optimizing for agent consumption.
- Fewer than 4% of sites have implemented any of these standards. Being early is a real advantage.
What You Will Learn Here
- What agent readiness means and why it matters right now
- The four dimensions of an agent-ready site
- How to implement
llms.txt,robots.txtAI rules, and markdown negotiation with real code - How to expose your site’s capabilities through MCP and API catalogs
- A practical checklist you can action today
The Shift Happening Now
The web was built for humans. HTML, CSS, JavaScript — all of it is optimized for a browser rendering engine and a human reading the result. When a search engine crawler visits your site, it reads your content passively: index the text, follow the links, move on.
AI agents work differently. An agent visiting your site wants to:
- understand what your site contains and what actions it supports
- consume content with minimal noise (no navbars, scripts, ads)
- know what it is and is not allowed to do
- find API endpoints and capabilities it can call autonomously
Traditional crawler AI agent
───────────────── ─────────────
visit URL visit URL
parse HTML look for llms.txt
extract text request /index.md (markdown)
follow links read Content Signals
store index find MCP server card
call API or tool
take action
That is a fundamentally different usage pattern. And the web has almost no signals for it yet.
The Four Dimensions of Agent Readiness
Think of agent readiness in four layers, from simplest to most powerful:
Layer 4 ─ Protocol Discovery ┐
MCP Server Cards │ agents can take action
API Catalogs │
Agent Skills ┘
Layer 3 ─ Bot Access Control ┐
Content Signals │ agents know what they can do
Web Bot Auth │
AI bot rules ┘
Layer 2 ─ Content Accessibility ┐
llms.txt │ agents can read efficiently
Markdown negotiation │
Structured reading ┘
Layer 1 ─ Discoverability ┐
robots.txt │ agents can find content
sitemap.xml │
HTTP Link headers ┘
Start at Layer 1 and work up. Every layer depends on the one below it.
Layer 1: Discoverability
robots.txt for AI crawlers
Most robots.txt files were written for Google and Bing. They say nothing about AI agents. The first step is to add explicit rules for the major AI crawlers.
# robots.txt
# Traditional crawlers
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# AI training crawlers (block if you don't want your content used for training)
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
# AI agents doing live tasks (usually fine to allow)
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
# Point agents to your llms.txt
Sitemap: https://yourdomain.com/sitemap.xml
The distinction matters: training crawlers (GPTBot, CCBot) are harvesting your content for model training. Task agents (ClaudeBot, PerplexityBot) are visiting on behalf of a user trying to accomplish something. These are different use cases and most sites should treat them differently.
HTTP Link Headers
For single pages, you can signal related machine-readable content via HTTP headers:
Link: </llms.txt>; rel="llms"
Link: </index.md>; rel="alternate"; type="text/markdown"
In a Next.js middleware or Express handler:
// next.config.ts
export default {
async headers() {
return [
{
source: '/(.*)',
headers: [
{
key: 'Link',
value: '</llms.txt>; rel="llms", </index.md>; rel="alternate"; type="text/markdown"',
},
],
},
];
},
};
Layer 2: Content Accessibility
llms.txt — the specification
The llms.txt specification defines a Markdown file at the root of your site that gives AI agents a structured reading list. It is analogous to sitemap.xml but designed for LLM context windows, not search index crawlers.
The file format:
# Your Site or Product Name
> One or two sentence description of what this site is and who it is for.
## Docs
- [Getting Started](https://yourdomain.com/docs/getting-started.md): Installation and first steps
- [API Reference](https://yourdomain.com/docs/api.md): Complete API documentation
- [Configuration](https://yourdomain.com/docs/config.md): All configuration options
## Guides
- [Authentication Guide](https://yourdomain.com/guides/auth.md): How to authenticate users
- [Deployment Guide](https://yourdomain.com/guides/deploy.md): Deploying to production
## Optional
- [Changelog](https://yourdomain.com/changelog.md): Recent changes
- [Roadmap](https://yourdomain.com/roadmap.md): Planned features
Key rules from the spec:
- The H1 heading is required (site or product name)
- The blockquote summary is optional but strongly recommended
- H2 sections organize groups of links
- Anything under
## Optionalcan be skipped by agents in constrained contexts - All linked pages should have markdown equivalents (more on this below)
Markdown content negotiation
The llms.txt file only works if the pages it links to are actually readable. HTML is noisy — navbars, scripts, ads, and boilerplate can triple the token cost of reading a page.
The pattern is simple: for each page at /docs/api, serve a clean markdown version at /docs/api.md.
In Cloudflare Workers or Pages:
// Cloudflare Worker: serve .md files or HTML based on request path
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
// If the agent requests /docs/api.md, serve clean markdown
if (url.pathname.endsWith('.md')) {
const htmlPath = url.pathname.replace('.md', '');
const content = await getMarkdownContent(htmlPath, env);
return new Response(content, {
headers: {
'Content-Type': 'text/markdown; charset=utf-8',
'Cache-Control': 'public, max-age=3600',
},
});
}
// Otherwise serve normal HTML
return env.ASSETS.fetch(request);
},
};
In Next.js with App Router, you can use a route handler:
// app/docs/[slug]/route.ts
import { getDoc } from '@/lib/content';
export async function GET(
request: Request,
{ params }: { params: { slug: string } }
) {
const url = new URL(request.url);
if (url.pathname.endsWith('.md')) {
const slug = params.slug.replace('.md', '');
const doc = await getDoc(slug);
return new Response(doc.markdown, {
headers: { 'Content-Type': 'text/markdown' },
});
}
// normal page rendering
}
Why this matters in numbers: Cloudflare’s implementation of these patterns for their own developer docs achieved 31% fewer tokens consumed per agent visit and 66% faster response times compared to serving HTML. That is a meaningful cost and latency reduction if you are building a docs-heavy product that agents interact with.
Hierarchical llms.txt files
For large sites, a single root-level llms.txt can overwhelm an agent’s context window. The solution is hierarchical files:
/llms.txt ← top-level index
/docs/llms.txt ← docs-specific index
/api/llms.txt ← API-specific index
/guides/llms.txt ← guides-specific index
Each subsection file covers only the pages in that directory. An agent exploring your docs section reads /docs/llms.txt without loading all of your marketing pages.
Layer 3: Bot Access Control
Content Signals
Content Signals is an emerging standard that lets site owners declare AI usage preferences inline in their pages — not just in robots.txt.
<head>
<!-- Allow AI agents to read content but not use it for training -->
<meta name="ai-usage" content="no-training, allow-agent-access">
<!-- Declare licensing terms for AI consumption -->
<meta name="ai-license" content="CC-BY-4.0">
</head>
As of April 2026, only about 4% of sites implement any form of Content Signals. The standard is still evolving, but adding these meta tags now costs nothing and starts building a machine-readable record of your preferences.
Web Bot Auth
Web Bot Auth is a newer protocol that lets servers challenge AI agents to prove they have authorization for a given action. This is relevant when your site has both public content (no auth needed) and sensitive operations (require verified agent identity).
The flow looks like this:
Agent visits /api/sensitive-action
Server returns 401 + WWW-Authenticate: BotAuth realm="agent-actions"
Agent presents signed token or OAuth credential
Server verifies and grants access
This is similar to how MCP handles tool authorization — only escalate to auth when the agent crosses into protected territory.
Layer 4: Protocol Discovery
This is where your site goes from being readable to being actionable.
llms-full.txt vs llms.txt
The spec differentiates between:
llms.txt— a concise reading list for constrained contexts (used by default)llms-full.txt— the complete content of all pages, concatenated, for agents that want to load everything at once
Generate llms-full.txt as part of your build:
// scripts/generate-llms-full.ts
import fs from 'fs';
import path from 'path';
import { getAllDocs } from './content';
async function generateLlmsFullTxt() {
const docs = await getAllDocs();
const content = docs
.map((doc) => `# ${doc.title}\n\nURL: ${doc.url}\n\n${doc.markdown}`)
.join('\n\n---\n\n');
fs.writeFileSync(path.join(process.cwd(), 'public', 'llms-full.txt'), content);
console.log(`Generated llms-full.txt with ${docs.length} pages`);
}
generateLlmsFullTxt();
MCP Server Cards
A Model Context Protocol Server Card is a /.well-known/mcp.json file that declares what MCP tools and resources your server exposes. This is the most powerful signal you can add — it tells any MCP-compatible agent exactly what capabilities are available.
// /.well-known/mcp.json
{
"name": "Your Product MCP Server",
"version": "1.0.0",
"description": "Provides tools to interact with Your Product's API",
"server_url": "https://api.yourdomain.com/mcp",
"auth": {
"type": "oauth2",
"authorization_url": "https://yourdomain.com/oauth/authorize",
"token_url": "https://yourdomain.com/oauth/token",
"scopes": ["read", "write"]
},
"tools": [
{
"name": "search_docs",
"description": "Search the product documentation",
"input_schema": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "Search query" },
"limit": { "type": "integer", "default": 10 }
},
"required": ["query"]
}
},
{
"name": "get_account_info",
"description": "Get the current user's account information",
"auth_required": true
}
]
}
An MCP-compatible agent that visits your site can discover this file, understand what tools are available, and use them — without you having to manually register with any agent platform.
Agent Skills Index
For sites that expose multiple capabilities, you can provide an Agent Skills index that catalogs what your site can do in natural language:
// /.well-known/agent-skills.json
{
"skills": [
{
"id": "search",
"name": "Search Documentation",
"description": "Search across all product docs and guides",
"endpoint": "/api/search"
},
{
"id": "purchase",
"name": "Purchase Products",
"description": "Browse catalog and complete purchases",
"endpoint": "/api/commerce",
"auth_required": true
}
]
}
Putting It All Together: A Site Structure
Here is what a fully agent-ready site looks like:
yourdomain.com/
├── robots.txt ← AI crawler rules
├── sitemap.xml ← standard sitemap
├── llms.txt ← agent reading list
├── llms-full.txt ← full concatenated content
├── .well-known/
│ ├── mcp.json ← MCP server card
│ └── agent-skills.json ← capability catalog
├── docs/
│ ├── llms.txt ← docs-specific reading list
│ ├── getting-started.md ← machine-readable doc
│ └── api.md ← machine-readable API docs
└── guides/
├── llms.txt ← guides-specific reading list
└── auth.md ← machine-readable guide
And the HTTP headers every page should return:
Link: </llms.txt>; rel="llms"
Link: </index.md>; rel="alternate"; type="text/markdown"
X-Robots-Tag: ai-training:noindex
Validation: Use isitagentready.com
Cloudflare built a free scanner at isitagentready.com that audits your site across all four layers and gives you a score with specific recommendations. Run it after implementing each layer to verify your changes are detectable.
The scanner checks:
robots.txtwith AI-specific directivesllms.txtpresence and validity- Markdown content negotiation
- Content Signals meta tags
/.well-known/mcp.jsonoragent-skills.json
It also exposes itself as an MCP server — so agent-ready tools can scan other sites programmatically.
Implementation Checklist
Work through these in order. Each item is independently valuable.
Day 1 (30 minutes)
- Update
robots.txtwith explicit rules for AI training crawlers vs. task agents - Create
/llms.txtwith an H1, a blockquote summary, and links to your 5–10 most important pages - Run isitagentready.com to get a baseline score
Week 1
- Add markdown versions of your top 10 docs pages (
.mdURL pattern) - Add HTTP
Linkheaders pointing agents to yourllms.txt - Add Content Signals meta tags for AI usage preferences
- Create
/llms-full.txtin your build pipeline
Month 1
- Build out hierarchical
llms.txtfiles per docs section - Serve markdown for all docs/guide pages (not just top 10)
- Create
/.well-known/mcp.jsonif you have an API - Add Web Bot Auth for protected endpoints
What to Expect
These standards are early. llms.txt was proposed in 2025 and is gaining adoption slowly. MCP Server Cards are even newer. You will not see dramatic traffic changes immediately.
What you will see:
- AI coding assistants like Claude Code and Cursor will consume your docs more accurately (direct token cost savings for your users)
- Products that index the web for agents will prioritize agent-ready sites
- As agentic commerce grows, sites with capability declarations will be discoverable by purchasing agents
The Cloudflare data is the clearest signal: the gap between agent-ready and agent-unready sites is already measurable in token counts and latency. A site that is 31% cheaper for agents to read is a site that gets used more.
The investment is small. A llms.txt and a few markdown pages take an afternoon. An MCP Server Card takes a day. Being one of the roughly 15 sites with a full implementation today is table stakes positioning for a web where every user has an agent working on their behalf.
Sources
- Cloudflare Blog, Agent Readiness Score: Making Websites AI-Compatible — April 2026. Source for adoption data (78% robots.txt, 4% Content Signals, 3.9% markdown negotiation, <15 MCP Server Card implementations) and Cloudflare docs performance metrics (31% fewer tokens, 66% faster).
- llms.txt specification — Official specification for the
/llms.txtfile format, including file structure, link format, and Optional section semantics. - Cloudflare Developers, Cloudflare Agents — Technical documentation for building agents on Cloudflare Workers and Durable Objects; relevant background on how production agents consume external content.
- isitagentready.com — Free agent readiness scanner; reference for the five compliance categories audited (Discoverability, Content Accessibility, Bot Access Control, Protocol Discovery, Commerce).
- Model Context Protocol, MCP Specification — Protocol specification for MCP tools, resources, and server cards used in the Protocol Discovery layer.