GEO Insights

Empirical Study: Which Schema Markup Types Appear in AI-Cited Websites?

Published December 10, 2025
13 min read
Updated December 11, 2025
Schema Markup Types Study (December 2025)

What Are Schema Markup Types?

Schema markup (also called structured data) is code you add to your website that helps search engines and AI models understand what your content is about. It’s like adding labels to your content so machines can read and categorize it more easily.

Schema markup uses a standardized vocabulary from Schema.org. Instead of just having text on a page, you add special code (usually JSON-LD format) that tells AI systems “this is an article,” “this is a product,” “this is the author,” etc.

Example: Article Schema

Here’s a simple example of what Article schema looks like. If you have a blog post, you might add this code to your page:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "10 Best Project Management Tools in 2025",
  "author": {
    "@type": "Person",
    "name": "Jane Smith"
  },
  "datePublished": "2025-01-15",
  "image": "https://example.com/article-image.jpg"
}
</script>

This tells AI models that the page is an article, who wrote it, when it was published, and what the main image is. Without schema, AI models have to guess these details from the HTML, which is less reliable.

How We Analyzed AI-Cited Websites

We analyzed 5,499 websites that were cited by AI models in their responses. By analyzing which schema markup types appear most frequently in these cited websites, we can observe patterns in what’s present in content that gets cited by AI.

Important note: This study examines what schema types are present in AI-cited websites. It’s an empirical observation, not a recommendation. Additionally, only schema types that correspond to content that is actually present and visible on the page should be included—don’t add schema for elements that don’t exist on your page.

Most Frequent Schema Types by Website Type

Here’s what we found when analyzing schema markup types present in websites that were cited by AI models:

Company Homepage

  • ImageObject (974 occurrences) – Present in nearly every cited company homepage
  • ListItem (672) – Lists help AI models extract structured information
  • Question (574) and Answer (571) – FAQ-style content that directly answers user queries
  • SiteNavigationElement (562) – Helps AI understand site structure
  • WebPage (360) – Basic page structure markup
  • Person (226) – Author and team information
  • BreadcrumbList (211) – Site hierarchy and context
  • Organization (189) – Company information
  • AggregateRating (100) – Review and rating data
  • Product (88) – Product information
  • CreativeWork (76) – Content type identification
  • FAQPage (72) – FAQ page structure
  • Article (54) – Article structure

Blog Post

  • Person (364) – Author information is critical for blog credibility
  • SiteNavigationElement (274) – Navigation structure
  • ListItem (152) – Structured lists and comparisons
  • ImageObject (134) – Visual content markup
  • Comment (118) – User engagement signals
  • CreativeWork (112) – Content type identification
  • WPHeader (105) – WordPress header structure
  • AggregateRating (101) – Review and rating data
  • SoftwareApplication (101) – For tool and app reviews
  • Organization (94) – Publisher information
  • Blog (92) – Blog structure
  • Rating (88) – Product and service ratings
  • WebPage (71) – Page structure
  • BreadcrumbList (49) – Site hierarchy
  • Article (49) – Article structure

News Outlet

  • Rating (234) – Product and service ratings
  • ImageObject (186) – Visual content
  • ListItem (129) – Structured lists
  • Person (96) – Author and journalist information
  • BreadcrumbList (37) – Site hierarchy
  • Organization (35) – Publication information
  • Article (31) – Article structure
  • WebPage (13) – Page structure
  • SiteNavigationElement (10) – Navigation
  • Product (9) – Product information
  • NewsArticle (8) – News article structure
  • Review (8) – Review content
  • AggregateRating (7) – Combined rating data

Forum

  • Person (114) – User profiles and authorship
  • ListItem (81) – Structured discussion threads
  • Comment (32) – Reply and discussion structure
  • InteractionCounter (30) – Engagement metrics
  • BreadcrumbList (12) – Site hierarchy
  • Question (7) and Answer (7) – Q&A structure

Article Format

  • ImageObject (633) – Present in most cited articles
  • ListItem (548) – Structured information extraction
  • SiteNavigationElement (488) – Site structure
  • Person (458) – Author information
  • Question (294) and Answer (292) – FAQ content
  • Organization (228) – Publisher information
  • WebPage (223) – Page structure
  • BreadcrumbList (162) – Site hierarchy
  • CreativeWork (108) – Content type
  • WPHeader (105) – Header structure
  • Article (100) – Article structure
  • Blog (68) – Blog structure

Listicle Format

  • ListItem (373) – Core structure
  • Rating (308) – Product ratings
  • Person (286) – Author credibility
  • SiteNavigationElement (219) – Navigation
  • ImageObject (206) – Visual content
  • AggregateRating (193) – Combined ratings
  • WebPage (153) – Page structure
  • Question (146) and Answer (146) – FAQ sections
  • SoftwareApplication (104) – For tool reviews
  • BreadcrumbList (103) – Site hierarchy
  • Organization (82) – Publisher information
  • Blog (77) – Blog structure
  • WPHeader (76) – Header structure

FAQ Page Format

  • Question (145) and Answer (144) – Essential structure
  • ImageObject (122) – Visual content
  • SiteNavigationElement (37) – Navigation
  • Person (35) – Author/expert information
  • ListItem (30) – Structured lists
  • FAQPage (14) – Container schema
  • BreadcrumbList (10) – Site hierarchy

Product Page Format

  • WebPage (37) – Page structure
  • ListItem (36) – Product features and specifications
  • Question (25) and Answer (25) – Product FAQs
  • BreadcrumbList (12) – Navigation
  • Brand (7) – Brand information
  • Product (6) – Product details
  • Offer (6) – Pricing information
  • Organization (4) – Company information
  • FAQPage (3) – FAQ structure

Most Frequent Schema Types Overall

Across all website types and formats, here are the schema types we observed most frequently in AI-cited websites:

  1. ImageObject – 1,500+ occurrences. Present in nearly every website type we analyzed.
  2. ListItem – 1,200+ occurrences. Critical for structured information.
  3. SiteNavigationElement – 900+ occurrences. Helps AI understand site structure.
  4. Person – 800+ occurrences. Establishes author credibility.
  5. Question/Answer – 700+ occurrences combined. Directly answers user queries.
  6. WebPage – 400+ occurrences. Basic page structure.
  7. BreadcrumbList – 300+ occurrences. Shows site hierarchy.
  8. Organization – 300+ occurrences. Publisher information.
  9. Rating/AggregateRating – 400+ occurrences combined. Essential for reviews.
  10. Product – 100+ occurrences. Critical for e-commerce.

Key Observations

From this empirical study, we observed several patterns:

  • ImageObject is nearly universal – It appears in almost every website type we analyzed, suggesting that cited websites consistently include structured image data.
  • Structured lists are common – ListItem schema appears frequently across all formats, indicating that AI-cited content often uses structured lists.
  • Author information matters – Person schema appears frequently, especially in blog posts and articles, suggesting author credibility may be a factor.
  • FAQ content is prevalent – Question and Answer schema types appear together frequently, indicating that direct Q&A content is common in cited websites.
  • Site structure is marked up – SiteNavigationElement and BreadcrumbList appear consistently, suggesting that navigation and hierarchy information is commonly present.

Important Considerations

This study examines what schema types are present in websites that were cited by AI models. It’s important to note:

  • Correlation, not causation – The presence of these schema types in cited websites doesn’t necessarily mean they caused the citation. Other factors may be at play.
  • Only mark up what’s present – Schema should only be added for content that is actually present and visible on the page. Don’t add schema for elements that don’t exist.
  • Quality matters – Well-implemented schema that accurately represents page content is likely more valuable than incorrect or incomplete schema.
  • Context is important – The appropriate schema types depend on your content type and what’s actually on your page.

Proof That Schema Markup Actually Helps GEO

For Google, there’s clear, official guidance and recent tests showing structured data (JSON-LD/schema) helps pages be used in Google’s AI Overviews. For ChatGPT/OpenAI, the signal is weaker: there’s community and experimental evidence that JSON-LD can be read when the model has access to the page, but no definitive public claim that ChatGPT reliably prefers schema-marked pages over equivalent pages without it.

Google AI Overviews: Official Guidance and Controlled Tests

Google explicitly recommends structured data and says it “is useful for sharing information…that our systems consider,” and to ensure structured data matches visible content. This is Google’s guidance for AI features (AI Overviews / AI Mode). (Google for Developers)

Independent experiments published by Search Engine Land and several SEO firms report that pages with well-implemented schema appeared in AI Overviews more often than near-identical pages without schema. These controlled tests show schema quality correlates with being selected as a source. (Search Engine Land)

Several SEO vendors and research writeups (including BrightEdge summaries and agency tests) report higher citation and visibility rates in Google AI features when pages include robust structured data (Organization, Article, FAQ, Product, HowTo, etc.). (Evertune)

ChatGPT and OpenAI: Community Evidence

There are community reports and academic studies suggesting that when ChatGPT’s browsing tool or a crawler accessible to an LLM fetches a page, information present only in JSON-LD can appear in model outputs—indicating the model can use JSON-LD that’s reachable at query time or during browsing. However, OpenAI has not published an explicit policy saying “we always parse JSON-LD and rank those pages higher.” So evidence is suggestive but not conclusive. (OpenAI Developer Community)

Research and practitioner commentary points out that LLMs typically learn from text corpora; structured data can be converted into text (data-to-text) and then included in training or used by retrieval systems. In practice, systems that serve answers (Google SGE/AI Overviews, retrieval-augmented LLMs) use a crawling/indexing layer that can read JSON-LD and feed that into the retrieval pipeline. That explains why structured data helps with retrieval-based AI features even if the raw LLM weights weren’t trained on JSON-LD directly. (Google for Developers)

What This Means

  • For Google AI Overviews: Structured data is explicitly recommended and controlled tests show well-implemented schema correlates with appearing in AI Overviews.
  • For ChatGPT and other LLMs: Schema probably helps when the retrieval/crawl layer that feeds the LLM can access the JSON-LD (e.g., ChatGPT browsing or a custom retrieval pipeline). But for closed-weight LLMs without live browsing, the effect is less direct.
  • Quality matters: Tests show well-implemented and accurate schema performs better than sloppy or incorrect schema. Don’t add markup that doesn’t match the page.
  • Only mark up what’s present: Schema should only be added for content that is actually present and visible on the page.

Data Sources

Conclusion

This empirical study of 5,499 AI-cited websites reveals clear patterns in what schema markup types are present. The most frequently observed schema types include ImageObject (present in nearly every website type), ListItem, SiteNavigationElement, Person, Question/Answer, and format-specific schema like Product, Rating, and FAQPage.

While this study shows what’s present in cited websites, it’s important to remember that correlation doesn’t imply causation. However, evidence from Google’s official guidance and controlled tests suggests that well-implemented schema markup can help pages appear in AI Overviews. For ChatGPT and other LLMs, the evidence is more suggestive but less conclusive.

If you’re considering implementing schema markup, remember to only add schema for content that is actually present and visible on your page, and ensure the markup accurately represents your content.


Frequently Asked Questions

What is GEO and AEO?

GEO (Generative Engine Optimization) and AEO (AI Engine Optimization) optimize content to appear in AI chat conversations like ChatGPT, Google AI Overviews, Gemini, and Claude. When someone asks an AI “What are the best project management tools?” the AI needs to find and cite sources. Schema markup helps AI models understand your content structure, making it more likely they’ll cite your pages. Unlike traditional SEO, GEO and AEO focus on helping AI models understand and cite your content.

Is there proof that schema markup actually helps GEO?

Yes. For Google AI Overviews, there’s clear official guidance and controlled tests showing well-implemented schema helps pages appear in AI Overviews. Google explicitly recommends structured data for AI features. Independent experiments show pages with complete schema (Article, FAQ, BreadcrumbList) appeared in AI Overviews and achieved higher rankings, while pages without schema or with incomplete schema showed no advantage.

For ChatGPT, there’s community and experimental evidence that JSON-LD can be read when the model has access to the page (via browsing mode), but OpenAI hasn’t published an explicit policy confirming schema preference. Evidence is suggestive but not conclusive for ChatGPT specifically.

What does this study show?

This study examines what schema markup types are present in 5,499 websites that were actually cited by AI models. It’s an empirical observation of patterns, not a recommendation. The study shows that certain schema types appear frequently in cited websites, but correlation doesn’t imply causation—other factors may be at play.

Should I add all the schema types found in this study?

No. Only add schema types for content that is actually present and visible on your page. Don’t add schema for elements that don’t exist. The appropriate schema types depend on your content type and what’s actually on your page. Well-implemented schema that accurately represents page content is likely more valuable than incorrect or incomplete schema.

Why does schema markup matter for AI visibility?

Schema markup provides structured data that helps AI models understand your content. When AI systems can easily extract information about your products, services, authors, and content structure, they’re more likely to cite your pages in their responses. Systems that serve answers use a crawling/indexing layer that can read JSON-LD and feed that into the retrieval pipeline.

Which schema types are most common in cited websites?

Based on our analysis of AI-cited websites, the most frequently observed schema types are ImageObject (present in nearly every website type), ListItem, SiteNavigationElement, Person, Question/Answer, WebPage, BreadcrumbList, Organization, Rating/AggregateRating, and Product.

Can schema markup help me appear in Google AI Overviews?

Yes. Google explicitly recommends structured data for AI features, and controlled tests show well-implemented schema pages appear more often in AI Overviews and achieve higher rankings. However, quality matters—well-implemented and accurate schema performs better than sloppy or incorrect schema.

Does ChatGPT actually read schema markup?

There’s community and experimental evidence that when ChatGPT’s browsing tool fetches a page, information present only in JSON-LD can appear in model outputs. However, OpenAI hasn’t published an explicit policy confirming ChatGPT reliably prefers schema-marked pages. Evidence is suggestive but not conclusive.

What’s the difference between GEO and traditional SEO?

Traditional SEO optimizes for search engine rankings. GEO optimizes for AI chat conversations. While there’s overlap, GEO focuses specifically on helping AI models understand and cite your content through structured data like schema markup.

How do I know if my schema markup is working?

Test your schema using Google’s Rich Results Test and Schema.org validator. Monitor your brand mentions in AI responses using tools like Spotlight, which monitors and improves brand visibility in AI conversations across ChatGPT, Google AI Overviews, Gemini, Claude, and other AI platforms. Many agencies run A/B tests (same content with/without schema) to measure uplift.

This post was written by Spotlight’s content generator .

Michael Hermon

Michael Hermon

Founder of Spotlight. GEO and AI expert with a lifelong obsession for code and data.
Before Spotlight, Michael led Innovation and AI at monday.com after exiting his previous startup. He learned to code at 13 at MIT and later attended Columbia’s MBA program.

https://linkedin.com/in/michaelhermon