Website Recycling

Blog · How-to

How to Make Your Website Readable by ChatGPT (Step-by-Step)

You can't make ChatGPT cite you. You can make sure ChatGPT, Claude, and Perplexity are able to read, understand, and attribute your page — which is the prerequisite for any citation at all. Here's the tactical, step-by-step version, plus how to verify it actually worked.

Published June 10, 2026 · Website Recycling

1. Understand how ChatGPT, Claude, and Perplexity fetch pages

When an AI engine needs live information, it fetches your page over HTTP — often with a named crawler such as GPTBot (OpenAI), ClaudeBot (Anthropic), or PerplexityBot, or via a live-browsing fetch. The critical detail: many of these fetches read the raw HTML and do not execute client-side JavaScript. Whatever your server hands back in that first response is, for practical purposes, the whole page as far as the engine is concerned.

That single fact drives everything below. If your content depends on scripts running after load, the engine reads an empty shell. The goal of every step that follows is to put your real content, and a clear description of what it means, into that first HTML response.

2. Get your content into server-rendered HTML

This is step one for a reason. Open your homepage, disable JavaScript in your browser, and reload. The text that survives is roughly what an AI fetch sees. If the page goes blank, that's the problem to solve before anything else — no amount of structured data helps a page with no readable body.

Sites built on static generators or server-side rendering pass this test by default. Heavily client-rendered sites (some app-style builds, certain page-builder themes) often fail it. The fix is to render content on the server or at build time so it ships in the HTML. This is the core of what a Full Recycle rebuild does — it reconstructs your site on a static edge stack so the content is present on first fetch.

3. Add structured data (Schema.org) — with examples

Structured data is how you tell a machine what your content is rather than making it infer. Add Schema.org JSON-LD for the entities that describe your business. At minimum:

Then validate it. A surprising amount of structured data in the wild uses deprecated types or has syntax errors that cause it to be silently ignored. Run your markup through a structured-data validator and fix every error before you move on. Valid-and-modest beats ambitious-and-broken.

4. Ship an llms.txt (and ai.txt)

Publish a plain-text llms.txt file at the root of your site. Think of it as a curated map for AI engines: a short list of your most important pages, each with a one-line description. It's an emerging convention, it costs nothing, and it gives a model a clean starting point instead of forcing it to crawl blind. If you maintain an ai.txt as well, keep the two consistent.

5. Allowlist the AI bots in robots.txt

AI crawlers respect robots.txt. If your file was written before these bots existed, they're operating on generic rules — and when a crawler is unsure whether it's welcome, it tends to stay cautious. Add explicit Allow directives for the bots you want to read your site: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and the others on your allowlist. Being explicit removes the ambiguity.

6. Keep the HTML clean and server-rendered

Beyond the content itself, clean markup helps. Use real semantic headings, keep your most important text high in the document, avoid burying content inside deeply nested interactive widgets, and write key passages in a definition-first, quotable style. The easier it is for a model to lift a clean sentence and attribute it to you, the more likely your content is to be useful to it. The deeper reasoning behind this is in our why it matters breakdown.

7. Verify it worked

Don't take it on faith — check:

  1. Fetch like a crawler. View source (or use a command-line fetch) and confirm your real content is in the returned HTML, not injected later.
  2. Validate the schema. Re-run your structured data through a validator; aim for zero errors.
  3. Confirm the allowlist. Read your robots.txt and verify the AI bots are explicitly allowed.
  4. Ask the engines. Ask ChatGPT or Perplexity to summarize your page and see whether it picks up your actual content. If it can summarize you accurately, it can read you.

One honest caveat to close on: being readable is the prerequisite for citation, not a promise of it. We give AI engines every signal they need to read and understand your site — what each engine ultimately chooses to surface is up to the engine, and it changes constantly. If you'd rather have all of this done for you and verified against an auditable scorecard, that's exactly what a Website Recycling rebuild delivers, backed by a 60-day money-back window. Results vary.

Frequently asked questions

How does ChatGPT actually read a web page?

When an AI engine needs current information, it fetches the page's HTML over HTTP, usually with a named crawler like GPTBot or a live-browsing fetch. Many of these fetches read the raw HTML and do not run client-side JavaScript. So whatever is present in the served HTML is what the engine sees — content injected later by scripts is often invisible.

What's the single most important thing to fix first?

Make sure your core content is in the server-rendered HTML, not painted in by JavaScript after load. You can test this by disabling JavaScript in your browser and reloading: if the page goes blank, an AI fetch likely sees the same blank page. Everything else — schema, llms.txt, allowlisting — matters less if the content itself isn't there on first fetch.

Do I need an llms.txt file?

It's a useful, low-cost signal. An llms.txt file is a plain-text map at the root of your site that points AI engines to your most important pages and gives a short description of each. It doesn't guarantee anything, but it's an emerging convention that makes your site easier to navigate for a model, and shipping one costs you nothing.

How do I verify an AI engine can read my page?

Fetch your URL the way a crawler would — view source or use a command-line fetch — and confirm your real content is in the returned HTML. Validate your Schema.org markup. Confirm GPTBot, ClaudeBot, and PerplexityBot are allowed in robots.txt. You can also simply ask ChatGPT or Perplexity to summarize your page and see whether it picks up your actual content. Note that being readable is a prerequisite for citation, not a guarantee of it.

Want these checks run on your site automatically?

Run the free scan →