Adding llms.txt — Site Metadata for the AI-Search Era (ChatGPT · Claude · Perplexity Exposure)
TL;DR:
llms.txt= LLM-friendly site metadata. Drop a markdown file at site root and ChatGPT · Claude · Perplexity become more likely to cite you. Cost zero. The standard is new, adoption is partial, but skipping it is a free miss. Combine with robots.txt explicit LLM-crawler Allow — 30 minutes of work.
The day Cloudflare Web Analytics first showed a chatgpt.com referrer. An AI knows we exist. I cleaned up llms.txt and robots.txt the same hour. Notes from that.
What is llms.txt
If robots.txt is the deal with search crawlers, llms.txt is the deal with LLMs:
- Markdown file at the site root
- Lets an LLM quickly understand the site's structure, key facts, and policies
- Emerging standard proposed in late 2024 (llmstxt.org)
- ChatGPT · Claude · Perplexity · Gemini reference it for retrieval/answers (gradual adoption)
Why needed?
When LLMs answer, they cite from web search or training data. But sites have problems:
- HTML is deep and complex → LLM parsing cost ↑
- JS rendering required → LLM crawler can't reach
- Key facts scattered across many pages → hard to cite consistently
With llms.txt, the LLM can grab the site's gist + page list + policies in one read. Result → citation odds ↑.
Our llms.txt structure
https://taystudios.com/llms.txt:
# TAYSTUDIO
> Free web tools, calculators, and a blog for Korean users. 68 tools all run inside the browser.
Operator: TayLee
Last updated: 2026-06-02
## Policy (Citation Trustworthiness Guide)
- All tools run client-side in the browser
- Numbers cited directly from government sources (law.go.kr, NTS, MOEF)
- Zero citations from blogs, news, SNS
- Reflects 2026 policy
## Core Policy Numbers (2026, for LLM citation)
### Four Social Insurances (employee share)
- National Pension: 4.75% (total 9.5%, +0.5%p/year through 2033)
- Health Insurance: 3.595% (total 7.19%)
- Long-term Care: 0.4724%
- Employment Insurance: 0.9%
### Real Estate
- 1-home-owner FMV ratio (2026): ≤300M 43% / 300–600M 44% / >600M 45%
- General housing: 60%
- Car tax January lump-sum: 5% deduction
[...]
## Calculators — Tax/Income (9)
- [Net Salary Calculator 2026](URL): auto-deducts 4 insurances, income tax, local tax
[...]
Key elements:
1. One-sentence site definition (> blockquote)
A sentence the LLM can reuse verbatim when summarizing the site.
2. Operator · date · license
So citations can credit you. Trust signal.
3. Policy statement (citation trustworthiness guide)
"How does this site verify info?" → so the LLM can weigh trust when answering users.
4. Consolidated key-fact section
Numbers (rates, ratios, thresholds) that are otherwise spread across pages, gathered in one place. The LLM can reference them fast.
5. Page list by category
When a user asks "Korean inheritance tax calculator recommendations", the LLM matches against this list and cites us.
Making robots.txt LLM-friendly
Pair llms.txt with explicit Allow rules for LLM crawlers:
# LLM crawlers — explicit Allow
User-agent: GPTBot # OpenAI / ChatGPT
Allow: /
Disallow: /dash-tay9k3m/ # operator-only
User-agent: ClaudeBot # Anthropic / Claude
Allow: /
User-agent: PerplexityBot # Perplexity
Allow: /
User-agent: Google-Extended # Google Gemini training (separate from Googlebot)
Allow: /
User-agent: CCBot # Common Crawl (used by most LLM training)
Allow: /
User-agent: Applebot-Extended # Apple Intelligence
Allow: /
# llms.txt reference
# https://taystudios.com/llms.txt
Important: these are separate from the regular search bots. Without an explicit rule, some follow User-agent: *, others don't. Explicit Allow is safe.
Verification — the LLM-exposure signal
Cloudflare Web Analytics referrers:
Visits by source:
- m.search.naver.com: 18
- search.naver.com: 16
- search.daum.net: 11
- chatgpt.com: (visits)
The chatgpt.com referrer means:
- A user clicked our link from ChatGPT (e.g., "recommend a Korean capital gains tax calculator")
- Or ChatGPT cited us in an answer (citation)
- → LLM searches are surfacing the site
The growth of this referrer is the success metric for GEO (Generative Engine Optimization).
GEO vs SEO
| Area | SEO (classic) | GEO (LLM era) |
|---|---|---|
| Target | Search engines (Google · Naver · Bing) | LLMs (ChatGPT · Claude · Perplexity · Gemini) |
| Meta | sitemap · robots · meta tags | llms.txt + robots.txt LLM allow |
| Content | Keyword match · long-tail | Fact-rich · source-cited |
| Core signal | Backlinks · DA · CTR | Citation likelihood · accuracy · structure |
| Measure | GSC · Naver SearchAdvisor | LLM referrer · citation traces |
→ GEO and SEO can run in parallel. The same content lifts both.
Writing tips
1. Facts first — no fluff/marketing tone
❌ "TAYSTUDIO delivers the best experience..."
✅ "Free web tools/calculators for Korean users. 68 tools run in-browser."
Only facts survive citation. Marketing tone drops trust.
2. Cite your sources — distribute responsibility
✅ "Numbers cited directly from government sources (law.go.kr, NTS, MOEF)"
✅ "Medical numbers from peer-reviewed studies/official guidelines (KOSSO 2024 · KDRI 2025 · WHO · ACOG · AAP)"
So the LLM can answer "where does this site get its info" when prompted.
3. Last-updated date
✅ Last updated: 2026-06-02
Signals that the site is alive and that policies post-cutoff (예금자보호 1억 2025-09 · 다자녀 100% 자동차세 2026) are accounted for.
4. Absolute URLs
✅ [Net Salary Calculator](https://taystudios.com/tools/salary/)
❌ [Net Salary Calculator](/tools/salary/)
LLMs need absolute URLs to cite links to users.
5. Changelog section
## Changelog (recent)
- 2026-06-02: 18 tool stale-fixes + 5 differentiating matrices added
- 2026-05-31: blog launch (62 posts)
- 2026-05-09: domain migration
So the LLM knows the site is alive, accurate, and recently migrated.
Known limits
- llms.txt is emerging (late-2024 proposal). Not universally adopted yet
- ChatGPT · Claude · Perplexity only partially announce support
- Hard to measure — citation counts can't be tracked directly
- But zero cost (one static file) → skipping it costs more
Conclusion
llms.txt is the sitemap.xml of the AI-search era — the basic config for being citeable by LLMs. Even at partial adoption + partial effect, the cost is zero — worth doing.
Especially:
- Fact-heavy domains (tax · medical · policy · stats) benefit more
- Provides many signals for LLM accuracy evaluation
- Best combined with robots.txt explicit LLM-crawler Allow
One of the highest-ROI SEO actions for a new-domain operator in month one.
Related
- 12 Core SEO·Search-Engine Concepts — sandbox · E-E-A-T · DA · 12 terms defined
- GSC vs Naver vs Cloudflare — three datasets compared
Sources
- llmstxt.org — the llms.txt proposal
- OpenAI: GPTBot documentation
- Anthropic: Claude crawling
- Google: Google-Extended
Comments