Adding llms.txt — Site Metadata for the AI-Search Era (ChatGPT · Claude · Perplexity Exposure)

TL;DR: llms.txt = LLM-friendly site metadata. Drop a markdown file at site root and ChatGPT · Claude · Perplexity become more likely to cite you. Cost zero. The standard is new, adoption is partial, but skipping it is a free miss. Combine with robots.txt explicit LLM-crawler Allow — 30 minutes of work.

The day Cloudflare Web Analytics first showed a chatgpt.com referrer. An AI knows we exist. I cleaned up llms.txt and robots.txt the same hour. Notes from that.

What is llms.txt

If robots.txt is the deal with search crawlers, llms.txt is the deal with LLMs:

  • Markdown file at the site root
  • Lets an LLM quickly understand the site's structure, key facts, and policies
  • Emerging standard proposed in late 2024 (llmstxt.org)
  • ChatGPT · Claude · Perplexity · Gemini reference it for retrieval/answers (gradual adoption)

Why needed?

When LLMs answer, they cite from web search or training data. But sites have problems:

  • HTML is deep and complex → LLM parsing cost ↑
  • JS rendering required → LLM crawler can't reach
  • Key facts scattered across many pages → hard to cite consistently

With llms.txt, the LLM can grab the site's gist + page list + policies in one read. Result → citation odds ↑.

Our llms.txt structure

https://taystudios.com/llms.txt:

# TAYSTUDIO

> Free web tools, calculators, and a blog for Korean users. 68 tools all run inside the browser.

Operator: TayLee
Last updated: 2026-06-02

## Policy (Citation Trustworthiness Guide)

- All tools run client-side in the browser
- Numbers cited directly from government sources (law.go.kr, NTS, MOEF)
- Zero citations from blogs, news, SNS
- Reflects 2026 policy

## Core Policy Numbers (2026, for LLM citation)

### Four Social Insurances (employee share)
- National Pension: 4.75% (total 9.5%, +0.5%p/year through 2033)
- Health Insurance: 3.595% (total 7.19%)
- Long-term Care: 0.4724%
- Employment Insurance: 0.9%

### Real Estate
- 1-home-owner FMV ratio (2026): ≤300M 43% / 300–600M 44% / >600M 45%
- General housing: 60%
- Car tax January lump-sum: 5% deduction

[...]

## Calculators — Tax/Income (9)
- [Net Salary Calculator 2026](URL): auto-deducts 4 insurances, income tax, local tax
[...]

Key elements:

1. One-sentence site definition (> blockquote)

A sentence the LLM can reuse verbatim when summarizing the site.

2. Operator · date · license

So citations can credit you. Trust signal.

3. Policy statement (citation trustworthiness guide)

"How does this site verify info?" → so the LLM can weigh trust when answering users.

4. Consolidated key-fact section

Numbers (rates, ratios, thresholds) that are otherwise spread across pages, gathered in one place. The LLM can reference them fast.

5. Page list by category

When a user asks "Korean inheritance tax calculator recommendations", the LLM matches against this list and cites us.

Making robots.txt LLM-friendly

Pair llms.txt with explicit Allow rules for LLM crawlers:

# LLM crawlers — explicit Allow
User-agent: GPTBot          # OpenAI / ChatGPT
Allow: /
Disallow: /dash-tay9k3m/    # operator-only

User-agent: ClaudeBot       # Anthropic / Claude
Allow: /

User-agent: PerplexityBot   # Perplexity
Allow: /

User-agent: Google-Extended # Google Gemini training (separate from Googlebot)
Allow: /

User-agent: CCBot           # Common Crawl (used by most LLM training)
Allow: /

User-agent: Applebot-Extended  # Apple Intelligence
Allow: /

# llms.txt reference
# https://taystudios.com/llms.txt

Important: these are separate from the regular search bots. Without an explicit rule, some follow User-agent: *, others don't. Explicit Allow is safe.

Verification — the LLM-exposure signal

Cloudflare Web Analytics referrers:

Visits by source:
- m.search.naver.com: 18
- search.naver.com: 16
- search.daum.net: 11
- chatgpt.com: (visits)

The chatgpt.com referrer means:

  • A user clicked our link from ChatGPT (e.g., "recommend a Korean capital gains tax calculator")
  • Or ChatGPT cited us in an answer (citation)
  • LLM searches are surfacing the site

The growth of this referrer is the success metric for GEO (Generative Engine Optimization).

GEO vs SEO

Area SEO (classic) GEO (LLM era)
Target Search engines (Google · Naver · Bing) LLMs (ChatGPT · Claude · Perplexity · Gemini)
Meta sitemap · robots · meta tags llms.txt + robots.txt LLM allow
Content Keyword match · long-tail Fact-rich · source-cited
Core signal Backlinks · DA · CTR Citation likelihood · accuracy · structure
Measure GSC · Naver SearchAdvisor LLM referrer · citation traces

GEO and SEO can run in parallel. The same content lifts both.

Writing tips

1. Facts first — no fluff/marketing tone

❌ "TAYSTUDIO delivers the best experience..."
✅ "Free web tools/calculators for Korean users. 68 tools run in-browser."

Only facts survive citation. Marketing tone drops trust.

2. Cite your sources — distribute responsibility

✅ "Numbers cited directly from government sources (law.go.kr, NTS, MOEF)"
✅ "Medical numbers from peer-reviewed studies/official guidelines (KOSSO 2024 · KDRI 2025 · WHO · ACOG · AAP)"

So the LLM can answer "where does this site get its info" when prompted.

3. Last-updated date

✅ Last updated: 2026-06-02

Signals that the site is alive and that policies post-cutoff (예금자보호 1억 2025-09 · 다자녀 100% 자동차세 2026) are accounted for.

4. Absolute URLs

✅ [Net Salary Calculator](https://taystudios.com/tools/salary/)
❌ [Net Salary Calculator](/tools/salary/)

LLMs need absolute URLs to cite links to users.

5. Changelog section

## Changelog (recent)

- 2026-06-02: 18 tool stale-fixes + 5 differentiating matrices added
- 2026-05-31: blog launch (62 posts)
- 2026-05-09: domain migration

So the LLM knows the site is alive, accurate, and recently migrated.

Known limits

  • llms.txt is emerging (late-2024 proposal). Not universally adopted yet
  • ChatGPT · Claude · Perplexity only partially announce support
  • Hard to measure — citation counts can't be tracked directly
  • But zero cost (one static file) → skipping it costs more

Conclusion

llms.txt is the sitemap.xml of the AI-search era — the basic config for being citeable by LLMs. Even at partial adoption + partial effect, the cost is zero — worth doing.

Especially:

  • Fact-heavy domains (tax · medical · policy · stats) benefit more
  • Provides many signals for LLM accuracy evaluation
  • Best combined with robots.txt explicit LLM-crawler Allow

One of the highest-ROI SEO actions for a new-domain operator in month one.

Sources

Share𝕏f

Comments