← All articles

Reference

AEO Glossary: Key Terms in AI Search Optimization

Definitions for every term you need to understand Answer Engine Optimization — AEO score, AI crawlers, llms.txt, JSON-LD, and the 6 dimensions of AI readiness.

Answer Engine Optimization has introduced a new vocabulary alongside the practice. This glossary defines every term you'll encounter when auditing, optimizing, or discussing AI search visibility — from foundational concepts to technical implementation details.

What Is AEO (Answer Engine Optimization)?

AEO (Answer Engine Optimization) is the practice of optimizing website content and technical structure so that AI-powered answer engines — ChatGPT, Perplexity, Google AI Overviews, Claude — can read, understand, and cite your content when answering user queries.

Core Concepts

AEO Score

A 0–100 measure of a website's AI search readiness. A complete AEO score blends a deterministic foundational score (16 technical checks, 50%) with an agent evaluation score (LLM assessment across 6 content quality dimensions, 50%). See how AEO scoring works for the full methodology.

Answer Engine

An AI system that generates direct answers to user questions rather than returning a list of links. ChatGPT serves 800 million weekly active users, Perplexity processed 780 million queries in a single month, and Google AI Overviews now appear in roughly 16% of all search queries.

Citation Readiness

The degree to which a page's content can be extracted and quoted by an AI engine in a generated answer. Citation readiness depends on content structure (heading hierarchy, paragraph length), content quality (evidence density, specificity), and technical accessibility (crawlability, parseable HTML).

Technical Terms

AI Crawler / AI Bot

An automated agent sent by an AI company to index web content for training data or real-time retrieval. Major AI crawlers have registered user-agent strings that can be referenced in robots.txt. See the AI Crawler Configuration Guide for a complete user-agent list.

GPTBot

OpenAI's web crawler, used to collect training data and power ChatGPT's browsing capabilities. User agent string: `GPTBot`. Blocked by 62% of top news publishers and 35.7% of the top 1,000 websites as of late 2024. Can be allowed or blocked via robots.txt.

ClaudeBot

Anthropic's web crawler. User agent string: `ClaudeBot`. Used for data collection and retrieval augmentation for Claude. Respects robots.txt directives.

PerplexityBot

Perplexity AI's web crawler, used to retrieve live web content for real-time answer generation. User agent: `PerplexityBot`. Perplexity relies heavily on real-time retrieval rather than fixed training data, making PerplexityBot particularly important to allow for current AI visibility.

CCBot

The Common Crawl bot. Common Crawl is a nonprofit that publishes open web crawl data widely used as training data for large language models including GPT-series, LLaMA, and many others. User agent: `CCBot/2.0`. Allowing CCBot increases the likelihood of your content appearing in training data for models using Common Crawl.

llms.txt

An emerging standard (analogous to robots.txt) for communicating site information to AI language models. Placed at yourdomain.com/llms.txt, it contains a structured overview of site content, links to key pages, and context about the site's scope and authority. Only 0.3% of the top 1,000 websites have implemented it as of mid-2025, making it a quick differentiator.

noai / noimageai

Meta tag values that signal to AI systems that the page content (noai) or images (noimageai) should not be used for AI training. Example: `<meta name='robots' content='noai'>`. Not universally honored by all AI crawlers. Setting these on public pages you want cited is counterproductive.

Structured Data Terms

JSON-LD

JSON for Linked Data — the recommended format for implementing structured data. Added in a `<script type='application/ld+json'>` tag, it describes page content in Schema.org vocabulary. Only 41% of web pages currently implement JSON-LD, leaving most of the web without this critical AI-readability signal.

Schema.org

A collaborative vocabulary for structured data on the web, maintained by Google, Microsoft, Yahoo, and Yandex. Schema.org defines the types (Article, Product, Organization, FAQPage, etc.) and properties used in JSON-LD markup. When an AI crawler encounters Schema.org markup, it can reliably classify page type, extract key facts, and understand content relationships.

The 6 AEO Evaluation Dimensions

DimensionDefinition
Answer ReadinessWhether a direct answer appears in the opening paragraph, before setup or context.
QuotabilityWhether the page contains clean, self-contained 40–60 word passages extractable verbatim as citations.
Evidence DensityThe degree to which claims are backed by specific statistics, named sources, dates, and data points.
Content DepthWhether the page provides comprehensive enough coverage to fully answer a question without other sources.
FreshnessWhether the content signals currency — explicit publication/modification dates, recent references.
Structural ClarityWhether the page has clear heading hierarchy, logical section flow, and parseable text (not hidden in JS).

Other Key Terms

E-E-A-T

Experience, Expertise, Authoritativeness, Trustworthiness — Google's framework for evaluating content quality. E-E-A-T signals (author credentials, primary research, named sources, organizational transparency) also influence how AI engines weight content for citation. High E-E-A-T signals improve authority across both traditional and AI search.

Training Cutoff

The date after which new web content was not included in an LLM's training data. Content published after a model's training cutoff won't be in that model's base knowledge. For retrieval-augmented AI engines (Perplexity, ChatGPT with browsing), training cutoff is less relevant since they retrieve live content at query time.

What is the difference between AEO and SEO?

SEO (Search Engine Optimization) optimizes for keyword-based search algorithms that rank pages by relevance and authority. AEO (Answer Engine Optimization) optimizes for AI comprehension models that extract and cite content in generated answers. The signals differ: SEO prioritizes backlinks and keyword placement; AEO prioritizes structured data, content extractability, and citation readiness.

What does llms.txt do?

llms.txt is a plain-text file placed at your domain root that gives AI systems a curated overview of your site — what it covers, who operates it, and which pages are most important. It functions like robots.txt (a convention AI crawlers check) but for communication rather than access control.

What is a citation in the context of AI search?

An AI citation occurs when an answer engine extracts a passage from your page and includes it (verbatim or summarized) in a generated answer, with a link or reference to your source. Citations are the AI-search equivalent of appearing on the first page of Google — they drive awareness and traffic at the point of user research.

What is the difference between GPTBot and ChatGPT-User?

GPTBot is OpenAI's automated web crawler that collects data for training and indexing. ChatGPT-User is the user agent string used when a user triggers ChatGPT's real-time browsing feature during a conversation. Both should be allowed in robots.txt if you want OpenAI's systems to access your content.