MCP Needs a robots.txt Moment

The MCP ecosystem is growing fast. Servers for GitHub, Slack, databases, browsers, internal APIs. But there's a problem nobody has really solved: how does an AI agent know a website offers MCP capabilities?

Right now, the answer is: it doesn't. You tell it. You hardcode the server URL, install a preset, or find it in a directory. That works fine for developer tooling you configure once. It doesn't work for the open web.

The Portals Problem

Several MCP directories have appeared. They solve a real problem — there are hundreds of MCP servers now and you need some way to find them. But they have a structural issue: centralization.

A website that wants to be discoverable has to register with the portal. An AI agent that wants to find MCP servers has to know which portal to query. Every discovery path runs through an intermediary.

That's not how the web handles this kind of problem.

robots.txt wasn't a portal feature. It was a convention: put a file at a known path, and any crawler can check it. No registration. No central index. Just a predictable location that any client can look up independently. RSS worked the same way. Sitemaps too. These patterns stuck because they're decentralized — client and server coordinate directly without anyone in the middle.

MCP needs the same thing.

The Gap

The discoverability problem has a few dimensions that need to be addressed together:

Discovery — How does an AI agent know a site offers MCP capabilities at all? Without a standard, every client needs bespoke knowledge of every server.

Scope — Which tools are available on which pages? A documentation site might want different tools on the API reference than on the homepage.

Consent — How does a user approve what an AI agent can do on a site? Different tools carry different risk. Calling a search endpoint is different from placing an order.

Security — Discovered servers are a prompt injection surface. There needs to be a shared model for what's safe to call automatically versus what requires explicit confirmation.

Four Mechanisms

I've been working on a spec that defines four ways for a website to advertise its MCP server. They're ordered by complexity — you can start with the simplest and add more as needed.

/mcp.txt — A plain text file at a predictable path listing MCP endpoint URLs. Same concept as robots.txt. Lowest barrier to adoption. Any site can add this in five minutes.

/.well-known/mcp.json — A JSON manifest with the full picture: endpoints, supported transports, authentication requirements, permission tiers, and page-specific rules. More expressive, still just a static file.

HTML <link> tags — Page-specific discovery. A <link rel="mcp" href="..."> in the page head lets any page advertise a different MCP endpoint than the site default. Useful for multi-tenant platforms or doc sites where available tools change by section.

HTTP MCP-Endpoint headers — For APIs. The discovery signal travels in the response itself, so an AI agent making an API call can discover the MCP server inline without a separate lookup.

These mechanisms have a precedence order. A page-specific <link> tag overrides the site-wide manifest. The manifest overrides mcp.txt. A client checks all four and uses the most specific applicable result.

Permission Tiers

Discovery without a consent model is incomplete. Once a client finds an MCP server, there needs to be a shared vocabulary for what level of access each tool requires.

The spec defines four tiers:

TierNameWhat it means
0ReadPublicRead-only, no auth, no consent needed
1ReadPrivateRead-only, auth required, one-time consent
2WriteMutates data, per-action or one-time consent
3SensitiveFinancial or destructive, mandatory per-action confirmation

A search tool is Tier 0. Reading a user profile is Tier 1. Posting a comment is Tier 2. Placing an order is Tier 3.

Site owners declare the tier for each tool in the manifest. Clients enforce it. Users see consistent consent behavior regardless of which AI client they're using.

This Site Implements It

domlee.dev already follows the spec. The MCP server is at /api/mcp, discoverable via /.well-known/mcp.json and /mcp.txt. Any MCP-aware client visiting this site can find and use the three tools — list_posts, get_post, search_by_tag — without any configuration. The tools are Tier 0: read-only, public, no consent required.

That's the whole point. Not a new concept — just the robots.txt pattern applied to MCP.

Status

The spec is a community draft, version 0.2.0. I've built working implementations across a browser extension, a Next.js middleware package, a validator CLI, and an MCP tools package for managing discovery config. I plan to propose it as a Specification Enhancement Proposal for the MCP spec.

If you run MCP servers, consider adding a mcp.txt. It's five minutes of work and it means any AI client visiting your site can discover your server without you having to be listed in any directory.


Next: Building an AI Browser — what I built to test this in practice, and what I found.