ChatGPT serves 800 million weekly active users. Perplexity processed 780 million queries in a single month. 53% of consumers used AI to support a buying decision in the last 90 days. If your website isn't appearing in AI answers, you are absent from the research conversation that precedes many purchasing decisions.
The frustrating part: the reasons are almost always fixable. This guide covers the eight most common causes — starting with the ones most likely to fully block AI visibility — with a specific diagnostic and fix for each.
Why Doesn't My Website Appear in ChatGPT?
The most common reasons websites don't appear in ChatGPT: AI crawlers are blocked in robots.txt, content was published after the model's training cutoff, pages have thin or non-extractable content, structured data is missing, or domain authority signals are weak.
Cause 1: Your Site Is Blocking AI Crawlers
This is the single most common fixable technical cause. Among the top 100 news publishers, 79% block at least one AI training bot, 62% block GPTBot specifically, and 67% block PerplexityBot. Across the broader web, GPTBot was blocked by 35.7% of the top 1,000 websites as of August 2024 — many of them without knowing they'd set that restriction.
How to diagnose it
Visit yourdomain.com/robots.txt. Look for `Disallow: /` under GPTBot, ClaudeBot, PerplexityBot, CCBot, or a wildcard `User-agent: *` rule without AI-specific exceptions.
How to fix it
Add explicit allow rules for each AI crawler you want to allow. See the AI Crawler Configuration Guide for exact robots.txt syntax and a full list of AI crawler user agents. If you have a wildcard disallow for scraper protection, add the AI crawler allow rules above it — robots.txt is processed in order.
Cause 2: You Published After the Model's Training Cutoff
ChatGPT and Claude are trained on data up to a cutoff date. Content published after that cutoff is not in the model's base knowledge. If you launched your product, published your key articles, or built your site recently, those pages may simply not exist in the model's training data.
This is the one cause that you cannot directly fix for LLMs with fixed training data. However, for retrieval-augmented AI engines — Perplexity AI, ChatGPT with browsing enabled, and other systems that retrieve from the live web — training cutoff is far less relevant. For those systems, what matters is whether your content is crawlable and citation-worthy at query time.
Cause 3: Your Content Is Too Thin to Be Worth Citing
AI agents have implicit quality standards for citation. A 200-word product page, a blog post that paraphrases common knowledge, or a landing page full of marketing copy without substantive answers won't be cited when more comprehensive resources exist on the same topic.
How to diagnose it
Read your most important pages and ask: does this provide a unique, comprehensive answer that's worth quoting? Does it contain specific data, named examples, or original analysis? A practical test: ask ChatGPT the question your page is supposed to answer. If the answer it gives is better than your page's content, your content is too thin.
How to fix it
Deepen your key pages. Add original analysis, comparison tables, FAQ sections, specific statistics with cited sources, and step-by-step guidance. Aim for pages that are demonstrably the best available answer on the topic — not the 50th variation of the same generic explanation. Long-form, comprehensive content earns significantly more AI citations than shallow pages on the same topic.
Cause 4: You Have No Structured Data
Structured data (JSON-LD using Schema.org) removes all ambiguity about what a page contains. Without it, AI crawlers have to infer your page type, content structure, and authorship from raw HTML — and inference errors mean your content may be miscategorized or underweighted. Yet only 41% of web pages currently implement JSON-LD, leaving the majority of the web without this critical signal.
Fix: add JSON-LD to your key pages. At minimum: Article schema on blog posts and guides (with author, datePublished, dateModified), Organization schema on your homepage, and FAQPage schema on any page with question-and-answer sections.
Cause 5: Your Content Structure Is Hard to Extract
AI systems extract passages from your pages to generate answers. If your most important information is embedded in JavaScript components, rendered dynamically, buried in long unbroken paragraphs, or hidden in accordions and tabs — it may not be extractable at all.
- Put your core answer in the first paragraph, before any setup or context.
- Use H2 headings to label every major section — AI agents use headings to decompose pages into named sections.
- Keep paragraphs to 2–3 sentences. Shorter paragraphs extract as cleaner citations.
- Use bullet lists and comparison tables for comparative or list-format information.
- Avoid hiding important content in modals, tabs, or accordions — these are frequently not indexed.
Cause 6: Your Site Has Weak Authority Signals
AI models weight content from authoritative sources more heavily. The signals that matter in the AI context: explicit author credentials (name, role, professional links in schema markup), citations by other authoritative sources on the web, presence in professional directories and databases, and domain longevity.
Fix: add detailed author bios with professional credentials and links (mark these up with Person schema), cite your sources explicitly within your content, and ensure your About page clearly establishes who produces your content and why they're qualified to write it.
Cause 7: You're Missing an llms.txt File
llms.txt is the emerging standard for communicating site structure directly to AI systems — analogous to robots.txt but for guidance rather than access control. Despite its value as a direct communication channel, only 0.3% of the top 1,000 websites have implemented it. A well-crafted llms.txt at yourdomain.com/llms.txt is a quick differentiator that helps AI crawlers understand what your site covers.
Cause 8: Your Site Has No Discoverable Sitemap
Without a sitemap.xml, AI crawlers discover your pages through internal links alone. Pages that are lightly linked internally — your most important guide, your methodology documentation, your comparison content — may never be discovered or indexed.
Fix: add sitemap.xml at your root, reference it in robots.txt with `Sitemap: https://yourdomain.com/sitemap.xml`, and ensure it includes all canonical URLs you want indexed.
Complete Diagnostic Checklist
| Cause | How to Check | Fix |
|---|---|---|
| AI crawlers blocked | Check robots.txt for GPTBot, ClaudeBot, PerplexityBot | Add allow rules for each AI crawler |
| Training cutoff | Note your content's publish dates vs. model cutoffs | Focus on Perplexity / ChatGPT with browsing; keep content current |
| Thin content | Read your pages: would you cite this over alternatives? | Add depth, data, examples, original analysis |
| No structured data | Use Google Rich Results Test | Add JSON-LD for Article, Organization, FAQPage |
| Hard-to-extract structure | Check heading hierarchy, paragraph length, dynamic rendering | Add H2s, shorten paragraphs, surface core answer first |
| Weak authority signals | Check your About/author pages for credentials and schema | Add Person schema, professional bios, cited sources |
| No llms.txt | Visit yourdomain.com/llms.txt | Create and publish a structured llms.txt |
| No sitemap | Visit yourdomain.com/sitemap.xml | Generate sitemap; reference it in robots.txt |
Run a free AEO audit at aeo-check.vercel.app to get a complete diagnosis of which of these causes affect your site — plus a prioritized list of what to fix first.
How do I get my website to appear in ChatGPT?
The most actionable steps: (1) ensure GPTBot is allowed in your robots.txt, (2) add JSON-LD structured data to your key pages, (3) make your content more comprehensive and answer-ready — opening with direct answers to the questions your audience asks AI engines. There's no submission process; AI crawlers discover and index based on these signals.
Does robots.txt apply to AI crawlers?
Yes. Major AI crawlers — OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot — all respect robots.txt directives. If your robots.txt blocks these user agents with Disallow: /, that crawler cannot index your content and you cannot appear in that AI system's answers.
How long does it take for ChatGPT to know about my website?
For ChatGPT's base model, it depends on training cycles — new content added after the model's training cutoff won't appear in base-model responses until the next training update, which can take months. For ChatGPT with browsing enabled and Perplexity AI (which retrieves from the live web in real time), discovery can happen within days of publishing, provided your content is crawlable.
Can I check if GPTBot has indexed my site?
You can check your server access logs for GPTBot user agent hits. OpenAI also provides a list of GPTBot IP ranges in their documentation, allowing you to filter logs specifically for OpenAI crawler activity. If you see no GPTBot activity in logs and have not blocked it, check whether your sitemap and robots.txt are accessible.
What is the most common reason websites don't appear in ChatGPT?
The two most common fixable causes are: (1) AI crawlers blocked in robots.txt — affecting up to 35.7% of top sites — and (2) thin content that isn't worth citing even when the crawler can access it. The fastest win is always checking robots.txt first, since a blocking rule prevents any other optimization from working.
Does appearing in ChatGPT help my business?
Research by First Page Sage found that 53% of consumers used AI to support buying decisions in the 90 days before October 2025, with 46% of business decision-makers doing the same. AI citations that happen during this research phase introduce your brand or product before the buyer visits any website — a top-of-funnel influence that grows as AI search adoption increases.
Sources
- ChatGPT reaches 800M weekly active users — TechCrunch, Oct 2025
- Perplexity: 780M queries in May 2025 — TechCrunch, Jun 2025
- 53% of consumers used AI for buying decisions — First Page Sage, Oct 2025
- 79% of top news sites block at least one AI bot — BuzzStream, Dec 2025
- GPTBot blocked by 35.7% of top 1,000 sites — PPC Land, Aug 2024
- 41% of pages implement JSON-LD — HTTP Archive Web Almanac 2024
- llms.txt adoption: 0.3% of top 1,000 sites — Rankability, Jun 2025