← All articles

Guide

How to Audit Your Website for AI Search Visibility

Learn how to audit your website for AI search visibility in 8 steps — from checking AI crawler access to evaluating content quality for ChatGPT and Perplexity.

53% of consumers used AI to make a buying-related decision in the 90 days before October 2025. Google AI Overviews now appear in roughly 16% of all queries, and where they appear, organic CTR falls by 61%. Your buyers are using AI to research before they visit your site — the question is whether AI is mentioning you, or your competitor.

The good news: AI search visibility is auditable. There is a defined set of technical and content signals that determine whether ChatGPT, Perplexity, and Google AI Overviews can read and cite your pages. This guide walks through all 8 steps.

What Is an AI Search Visibility Audit?

An AI search visibility audit evaluates whether AI engines like ChatGPT and Perplexity can crawl, parse, and cite your pages. It covers crawler access, structured data, metadata, content quality, and discoverability — across 8 checkpoints.

Step 1: Verify AI Crawler Access

Before anything else, confirm that AI crawlers can access your site. This is the single most common fixable issue found in AEO audits — and often the most damaging. Among the top 100 news sites, 79% block at least one AI training bot, with 62% specifically blocking GPTBot and 67% blocking PerplexityBot. Many block these crawlers unintentionally, through broad scraper-blocking rules or CMS plugin defaults.

Check robots.txt

Visit yourdomain.com/robots.txt and scan for Disallow rules targeting AI-specific user agents. The major AI crawlers: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity AI), GoogleOther (Google experimental crawlers), CCBot (Common Crawl, which feeds many models).

If you see `Disallow: /` under any of these agents, or a wildcard disallow (`User-agent: *`) with no exceptions, those crawlers cannot index your content. Remove the restriction, or add explicit allow rules for AI crawlers above the wildcard block.

Check for AI-blocking meta tags

Check your page source for `<meta name="robots" content="noai">` or `content="noimageai">`. These block AI ingestion at the page level. If they appear on public content you want cited, remove them. See the AI Crawler Configuration Guide for a complete reference.

Step 2: Audit Your Metadata

AI agents parse page metadata to classify and summarize content before reading the body. Missing or generic metadata forces the agent to guess — and wrong guesses lead to miscitation or no citation.

  • Title tag: every page needs a descriptive, unique <title> that clearly states the topic. Avoid brand-only titles on content pages.
  • Meta description: 150–160 characters that directly summarizes what the visitor will find. AI agents use this for pre-classification.
  • Open Graph tags: og:title, og:description, og:type, og:url should be set on all indexable pages. Many AI crawlers consume OG tags when meta tags are thin.
  • Canonical URL: prevents signal dilution when multiple URL variants exist for the same content.

Step 3: Evaluate Structured Data

Structured data (JSON-LD using Schema.org vocabulary) removes all ambiguity about what a page contains. Despite being the clearest possible AI-readability signal, only 41% of web pages currently implement JSON-LD — meaning the majority of the web is leaving this signal blank.

Schema TypeBest ForKey Properties
Article / BlogPostingBlog posts, guidesheadline, author, datePublished, dateModified
FAQPageFAQ sectionsmainEntity with Question/Answer pairs
HowToStep-by-step guidesname, step array with text
ProductProduct pagesname, description, offers, aggregateRating
OrganizationHomepage / About pagesname, url, logo, sameAs
BreadcrumbListHierarchical pagesitemListElement with position and item

Step 4: Audit Content Structure

AI agents parse HTML heading structure to decompose a page into named sections. Poor heading hierarchy makes this decomposition fail — reducing the chance that any part of your page gets extracted as a citation.

  • One H1 per page — the primary topic of the page.
  • H2s for major sections — they function as a table of contents for AI agents.
  • H3s for subsections — avoid skipping levels (H1 → H3 with no H2).
  • Descriptive headings, not clever ones: 'Why AI Agents Can't Read Your Site' outperforms 'The Problem'.
  • No headings that are pure images or icons with no text content.

Step 5: Evaluate Content Quality for AI Citation

Technical readability is necessary but not sufficient. AI agents also evaluate whether your content is worth citing. The six dimensions that most influence citation decisions are: answer readiness, quotability, evidence density, content depth, freshness, and structural clarity. These are the same dimensions scored in a full AEO audit.

Answer readiness

Start each page with a direct answer to the core question before any setup or context. AI systems systematically favor content that places the key answer in the first paragraph. Don't bury your conclusion.

Quotability

Include clean, self-contained paragraphs of 40–60 words that can stand alone as a citation. Comparison tables and FAQ blocks are particularly citation-friendly formats — they provide structured, extractable answers in a compact format.

Evidence density

Include specific statistics, named sources, dates, and data points. Vague claims ('many companies are adopting AI') are less citeable than specific ones with named sources and dates. Every statistic should link to a primary source.

Content depth

Shallow pages rarely earn AI citations. Comprehensive coverage of a topic — addressing the main question and all common follow-up questions — is a strong signal of authoritative content. Aim for at minimum 1,500 words on informational topics.

Step 6: Set Up llms.txt

llms.txt is an emerging standard that lets site owners provide AI crawlers with a curated site overview. Despite its value, only 0.3% of the top 1,000 websites have implemented it — making it a quick differentiator. Place a llms.txt at yourdomain.com/llms.txt with a brief site overview and links to your most important pages.

Step 7: Verify Sitemap Availability

A sitemap.xml at /sitemap.xml helps AI crawlers discover all your pages efficiently. Without it, crawlers rely entirely on internal links and may miss important content. Reference your sitemap in robots.txt with `Sitemap: https://yourdomain.com/sitemap.xml`.

Step 8: Run an Automated AEO Audit

Manual checks surface the most visible issues but miss edge cases across multi-page sites. An automated audit crawls your site end-to-end, checks all 16 technical signals simultaneously, then sends your best pages to an LLM for the subjective evaluation that manual auditing can't replicate. See the best AEO audit tools comparison to find the right tool for your workflow.

Run a free AEO audit at aeo-check.vercel.app. It crawls up to 10 pages, runs all 16 deterministic checks, evaluates your top content across 6 AI-readiness dimensions, and generates a prioritized fix list.

Who This Audit Helps

Danielle, marketing director at a 40-person professional services firm

Her firm advises clients on financial planning, but ChatGPT answers in their category consistently cite larger publications instead. An AI search audit reveals two issues: their best articles lack published dates in schema markup (freshness score: 1/5) and every page is missing canonical tags (causing duplicate signal dilution). Both are template-level fixes she can spec for development in a single sprint.

Alex, freelance SEO consultant

Alex runs AEO audits as an on-boarding service for new clients. The structured format — 16 checks, 6 dimensions, prioritized fix list — gives him a client-ready deliverable with concrete, scoped recommendations. The most common finding he surfaces: clients with strong traditional SEO scores but no JSON-LD on their most important pages. See AEO audits for digital agencies for a full agency workflow.

Engineering team at a growing e-commerce retailer

Their product category pages score 68 on the first audit — mostly due to missing structured data and thin content above the product grid. The audit surfaces a template-level fix: adding Product schema and a 400-word buying guide to every category page template. Since they have 120 category pages, fixing the template once solves the issue site-wide.

Common Issues Found in AI Search Audits

IssueHow CommonAI Visibility Impact
Missing or thin meta descriptionsVery commonHigh — agents can't pre-classify page intent
No structured data (JSON-LD)Common — affects ~59% of pagesHigh — removes machine-readable page type signals
Accidental AI crawler blockingLess common but criticalTotal — blocked pages are never cited
No llms.txt fileExtremely common (99.7% of top 1K sites)Medium — missed opportunity to guide AI crawlers
Multiple H1 tagsCommon in CMS-built sitesMedium — confuses page topic classification
Thin content on key pagesVery commonHigh — insufficient substance for citation
No sitemapLess commonMedium — crawler discovery depends entirely on links

What does an AI search visibility audit check?

An AI search audit checks three categories: technical access (can AI crawlers reach your pages?), metadata and structured data (can agents classify and understand your content?), and content quality (is your content worth citing?). A full audit covers 16 deterministic technical checks and an LLM evaluation across 6 content quality dimensions.

How is an AI search audit different from a traditional SEO audit?

A traditional SEO audit focuses on keyword usage, backlinks, and Google's ranking signals. An AI search audit focuses on machine comprehension — whether an LLM can extract, understand, and quote your content accurately. Many technical overlaps exist (metadata, sitemaps, page structure), but AI auditing adds LLM-specific signals like structured data depth, content quotability, and crawler-access configuration for AI bots specifically.

Which AI engines should I optimize for?

ChatGPT, Perplexity AI, Google AI Overviews, and Claude are the most widely used. The same technical and content-quality signals improve visibility across all of them. The main differences are in crawler access (each uses different user agents) and training data (ChatGPT and Claude have training cutoffs; Perplexity retrieves from the live web in real time).

How long does an AI search audit take?

An automated audit takes 2–5 minutes to crawl up to 10 pages and return a full report with scores, check results, and a prioritized fix list. A manual audit of the same scope takes 1–3 hours. Enterprise sites with hundreds of templates benefit from automated audits run page-by-page across each template type.

How often should I re-audit my site?

Re-audit after any significant deployment, content change, or robots.txt update. For active publishing sites, a monthly cadence catches regressions early. Quarterly is a reasonable baseline for most teams. Always re-audit after deploying structured data changes to confirm they're being read correctly.