← All articles

Diagnostic

Why Your Website Doesn't Appear in ChatGPT — 8 Causes

Find out why your website isn't cited by ChatGPT or Perplexity. Eight root causes — AI crawler blocks, thin content, missing schema — with specific fixes for each.

ChatGPT serves 800 million weekly active users. Perplexity processed 780 million queries in a single month. 53% of consumers used AI to support a buying decision in the last 90 days. If your website isn't appearing in AI answers, you are absent from the research conversation that precedes many purchasing decisions.

The frustrating part: the reasons are almost always fixable. This guide covers the eight most common causes — starting with the ones most likely to fully block AI visibility — with a specific diagnostic and fix for each.

Why Doesn't My Website Appear in ChatGPT?

The most common reasons websites don't appear in ChatGPT: AI crawlers are blocked in robots.txt, content was published after the model's training cutoff, pages have thin or non-extractable content, structured data is missing, or domain authority signals are weak.

Cause 1: Your Site Is Blocking AI Crawlers

This is the single most common fixable technical cause. Among the top 100 news publishers, 79% block at least one AI training bot, 62% block GPTBot specifically, and 67% block PerplexityBot. Across the broader web, GPTBot was blocked by 35.7% of the top 1,000 websites as of August 2024 — many of them without knowing they'd set that restriction.

How to diagnose it

Visit yourdomain.com/robots.txt. Look for `Disallow: /` under GPTBot, ClaudeBot, PerplexityBot, CCBot, or a wildcard `User-agent: *` rule without AI-specific exceptions.

How to fix it

Add explicit allow rules for each AI crawler you want to allow. See the AI Crawler Configuration Guide for exact robots.txt syntax and a full list of AI crawler user agents. If you have a wildcard disallow for scraper protection, add the AI crawler allow rules above it — robots.txt is processed in order.

Cause 2: You Published After the Model's Training Cutoff

ChatGPT and Claude are trained on data up to a cutoff date. Content published after that cutoff is not in the model's base knowledge. If you launched your product, published your key articles, or built your site recently, those pages may simply not exist in the model's training data.

This is the one cause that you cannot directly fix for LLMs with fixed training data. However, for retrieval-augmented AI engines — Perplexity AI, ChatGPT with browsing enabled, and other systems that retrieve from the live web — training cutoff is far less relevant. For those systems, what matters is whether your content is crawlable and citation-worthy at query time.

Cause 3: Your Content Is Too Thin to Be Worth Citing

AI agents have implicit quality standards for citation. A 200-word product page, a blog post that paraphrases common knowledge, or a landing page full of marketing copy without substantive answers won't be cited when more comprehensive resources exist on the same topic.

How to diagnose it

Read your most important pages and ask: does this provide a unique, comprehensive answer that's worth quoting? Does it contain specific data, named examples, or original analysis? A practical test: ask ChatGPT the question your page is supposed to answer. If the answer it gives is better than your page's content, your content is too thin.

How to fix it

Deepen your key pages. Add original analysis, comparison tables, FAQ sections, specific statistics with cited sources, and step-by-step guidance. Aim for pages that are demonstrably the best available answer on the topic — not the 50th variation of the same generic explanation. Long-form, comprehensive content earns significantly more AI citations than shallow pages on the same topic.

Cause 4: You Have No Structured Data

Structured data (JSON-LD using Schema.org) removes all ambiguity about what a page contains. Without it, AI crawlers have to infer your page type, content structure, and authorship from raw HTML — and inference errors mean your content may be miscategorized or underweighted. Yet only 41% of web pages currently implement JSON-LD, leaving the majority of the web without this critical signal.

Fix: add JSON-LD to your key pages. At minimum: Article schema on blog posts and guides (with author, datePublished, dateModified), Organization schema on your homepage, and FAQPage schema on any page with question-and-answer sections.

Cause 5: Your Content Structure Is Hard to Extract

AI systems extract passages from your pages to generate answers. If your most important information is embedded in JavaScript components, rendered dynamically, buried in long unbroken paragraphs, or hidden in accordions and tabs — it may not be extractable at all.

  • Put your core answer in the first paragraph, before any setup or context.
  • Use H2 headings to label every major section — AI agents use headings to decompose pages into named sections.
  • Keep paragraphs to 2–3 sentences. Shorter paragraphs extract as cleaner citations.
  • Use bullet lists and comparison tables for comparative or list-format information.
  • Avoid hiding important content in modals, tabs, or accordions — these are frequently not indexed.

Cause 6: Your Site Has Weak Authority Signals

AI models weight content from authoritative sources more heavily. The signals that matter in the AI context: explicit author credentials (name, role, professional links in schema markup), citations by other authoritative sources on the web, presence in professional directories and databases, and domain longevity.

Fix: add detailed author bios with professional credentials and links (mark these up with Person schema), cite your sources explicitly within your content, and ensure your About page clearly establishes who produces your content and why they're qualified to write it.

Cause 7: You're Missing an llms.txt File

llms.txt is the emerging standard for communicating site structure directly to AI systems — analogous to robots.txt but for guidance rather than access control. Despite its value as a direct communication channel, only 0.3% of the top 1,000 websites have implemented it. A well-crafted llms.txt at yourdomain.com/llms.txt is a quick differentiator that helps AI crawlers understand what your site covers.

Cause 8: Your Site Has No Discoverable Sitemap

Without a sitemap.xml, AI crawlers discover your pages through internal links alone. Pages that are lightly linked internally — your most important guide, your methodology documentation, your comparison content — may never be discovered or indexed.

Fix: add sitemap.xml at your root, reference it in robots.txt with `Sitemap: https://yourdomain.com/sitemap.xml`, and ensure it includes all canonical URLs you want indexed.

Complete Diagnostic Checklist

CauseHow to CheckFix
AI crawlers blockedCheck robots.txt for GPTBot, ClaudeBot, PerplexityBotAdd allow rules for each AI crawler
Training cutoffNote your content's publish dates vs. model cutoffsFocus on Perplexity / ChatGPT with browsing; keep content current
Thin contentRead your pages: would you cite this over alternatives?Add depth, data, examples, original analysis
No structured dataUse Google Rich Results TestAdd JSON-LD for Article, Organization, FAQPage
Hard-to-extract structureCheck heading hierarchy, paragraph length, dynamic renderingAdd H2s, shorten paragraphs, surface core answer first
Weak authority signalsCheck your About/author pages for credentials and schemaAdd Person schema, professional bios, cited sources
No llms.txtVisit yourdomain.com/llms.txtCreate and publish a structured llms.txt
No sitemapVisit yourdomain.com/sitemap.xmlGenerate sitemap; reference it in robots.txt
Run a free AEO audit at aeo-check.vercel.app to get a complete diagnosis of which of these causes affect your site — plus a prioritized list of what to fix first.

How do I get my website to appear in ChatGPT?

The most actionable steps: (1) ensure GPTBot is allowed in your robots.txt, (2) add JSON-LD structured data to your key pages, (3) make your content more comprehensive and answer-ready — opening with direct answers to the questions your audience asks AI engines. There's no submission process; AI crawlers discover and index based on these signals.

Does robots.txt apply to AI crawlers?

Yes. Major AI crawlers — OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot — all respect robots.txt directives. If your robots.txt blocks these user agents with Disallow: /, that crawler cannot index your content and you cannot appear in that AI system's answers.

How long does it take for ChatGPT to know about my website?

For ChatGPT's base model, it depends on training cycles — new content added after the model's training cutoff won't appear in base-model responses until the next training update, which can take months. For ChatGPT with browsing enabled and Perplexity AI (which retrieves from the live web in real time), discovery can happen within days of publishing, provided your content is crawlable.

Can I check if GPTBot has indexed my site?

You can check your server access logs for GPTBot user agent hits. OpenAI also provides a list of GPTBot IP ranges in their documentation, allowing you to filter logs specifically for OpenAI crawler activity. If you see no GPTBot activity in logs and have not blocked it, check whether your sitemap and robots.txt are accessible.

What is the most common reason websites don't appear in ChatGPT?

The two most common fixable causes are: (1) AI crawlers blocked in robots.txt — affecting up to 35.7% of top sites — and (2) thin content that isn't worth citing even when the crawler can access it. The fastest win is always checking robots.txt first, since a blocking rule prevents any other optimization from working.

Does appearing in ChatGPT help my business?

Research by First Page Sage found that 53% of consumers used AI to support buying decisions in the 90 days before October 2025, with 46% of business decision-makers doing the same. AI citations that happen during this research phase introduce your brand or product before the buyer visits any website — a top-of-funnel influence that grows as AI search adoption increases.