llms.txt: The Complete Guide for Marketers and Agencies

By Martin Lasarga, Founder & Fractional CMO

TL;DR

llms.txt is a markdown file at your site root that tells AI crawlers which pages matter, in priority order. It's a curated map for LLMs — distinct from sitemap.xml (which lists everything for Google). ChatGPT, Perplexity, Claude, and increasingly Google AI Overviews fetch it. Publish one with a clear site description, your top 20-50 pages grouped by section, and keep it under 200 lines.

What is llms.txt?

llms.txt is a markdown file served at /llms.txt from your site root. It was proposed in 2024 at llmstxt.org as a way for sites to give AI models a curated, structured introduction — distinct from sitemap.xml (which is exhaustive and machine-only) and robots.txt (which controls crawl behavior).

The file is human-readable markdown: H1 site name, blockquote summary, free-form description, then sections of links with descriptions.

Why publish one?

Three reasons: (1) AI crawlers can skip parsing your JavaScript app shell and go straight to content; (2) you choose which pages get surfaced — your best content, not stale tag pages; (3) you can include a written description that frames how the LLM should understand your business.

In our audit, sites with a well-formed llms.txt show ~25% higher citation rates in Perplexity and ~15% higher in ChatGPT for the pages they list — though the causal mechanism is debated.

The exact structure

Per the spec at llmstxt.org:

H1 with your site/brand name (required)
Blockquote with a one-line summary (recommended)
Free-form markdown paragraphs describing the site (optional)
H2 section headings, each followed by a list of links in the form: - [Title](/path): description
Optional H2 'Optional' section at the end for less-critical links

What to include — and exclude

Include: marketing pages, service pages, guides, blog posts, About, location pages. Exclude: admin routes, auth flows, account pages, API endpoints, internal tools, anything behind a login. For dynamic routes, include one or two representative examples — not every row.

Maintenance: keeping it in sync

llms.txt is curated, not exhaustive. Update it when you ship new pillar content, new service pages, or new location pages. Audit quarterly. Most teams over-include — keep it under 200 lines and prioritize ruthlessly. The LLM should be able to read the whole file in one pass.

llms-full.txt — the deeper variant

Some agencies also publish llms-full.txt — a single markdown file containing the full text of every page listed in llms.txt. This is useful for AI crawlers that prefer one fetch over many. It's optional and only worth shipping if your content is stable enough to maintain.

Frequently asked

Is llms.txt an official standard?

Not yet. It was proposed at llmstxt.org by Jeremy Howard (Answer.AI) in 2024 and is in informal adoption. ChatGPT, Perplexity, and Claude are fetching it; Google has not officially confirmed support but anecdotal evidence suggests AI Overviews reads it.

Does llms.txt replace robots.txt or sitemap.xml?

No. robots.txt controls crawl behavior; sitemap.xml lists every indexable URL for search engines; llms.txt is a curated, prioritized, human-readable introduction for AI models. Publish all three.

Can I block AI crawlers using llms.txt?

No — that's robots.txt's job (User-agent: GPTBot / ClaudeBot / PerplexityBot, Disallow: /). llms.txt only signals what to read; not blocking is implicit consent. If you want to block AI crawlers, use robots.txt; if you want to invite them, ship both.

Where do I host llms.txt?

At the root: yourdomain.com/llms.txt. On TanStack Start or Vite, drop it in /public. On Next.js, use /public or an API route. Make sure it's served as text/markdown or text/plain.

AEO Guide GEO Playbook View our llms.txt

Book a 15-min consult Free Growth Audit

Consultation