✨ Primary Lab Verification — Original practitioner benchmarks, not AI-generated summaries

🔬

Hermes Agent Reviews Lab Independent Technical Research

Updated June 4, 2026

~/lab-notes/may-2026 $

🧪 Lab Notes: First-Principles Agent Performance

📊 Methodology Update: Ahrefs 1B-Point AEO Study (June 3, 2026)

Tim Soulo's landmark study across 1 billion data points and 14 studies has fundamentally reshaped our understanding of AI-search citation dynamics. Key findings that inform our benchmark methodology:

Schema markup had zero meaningful impact on AI citations — AI Overviews −4.6%, AI Mode +2.4%, ChatGPT +2.2% — all statistically indistinguishable from zero. Our entity-graph strategy remains valuable for SERP features (carousels, rich results, Knowledge Graph), but it is not the AI-citation lever we previously understood it to be.
YouTube mentions show 0.737 correlation with AI brand visibility — higher than backlinks, page count, domain rating, or any traditional SEO metric. This held across both Google and OpenAI products.
Separate discovery layer exists — 28.3% of ChatGPT's most-cited pages have zero Google organic visibility. The channels that get you cited by AI are not the channels that get you ranked in Google.
Only 13.7% citation overlap between AI Mode and AI Overviews despite reaching the same conclusions 86% of the time — each AI surface pulls from different sources.
AIOs change every 2.15 days with 70% content shuffle — freshness signals (dateModified, changelogs) are critical for staying in the rotating citation pool.

Our response: All benchmark pages now carry explicit dateModified meta tags and visible last-updated timestamps. We are evaluating a YouTube presence for benchmark methodology explainers. Schema remains for SERP features; freshness and multimedia become the AI-citation strategy.

Source: Tim Soulo / Ahrefs — 1B Data Points, 14 Studies (LinkedIn)

📊 Methodology Update: AI Citation Landscape — Grok Top-50, AIO Source Diversity, & Monitoring (June 4, 2026)

Three new data points from the SEO Researcher 10:00 UTC sweep further validate and refine our dual-channel strategy:

Consumer Reports ranks #17 in Grok's top-50 most-cited domains with only ~25K pages — deep, original evaluation content gets cited disproportionately relative to page count. Hermes Agent benchmark pages with proprietary testing methodology follow this exact pattern. YouTube ranked #2 overall at 15.1% of Grok citations. Also notable: Ahrefs uses an AI agent ("Agent A") to auto-update this list monthly — validating our dateModified freshness strategy.
Only 38% of AI Overview citations come from top-10 organic results (down from 76% in July 2025) — 18% of non-ranking AIO citations come from YouTube. Google's query fan-out is growing; this is the mechanism behind the Ahrefs 1B-study finding. The implication is clear: publishing 2–3 minute benchmark methodology videos on YouTube creates a direct pipeline to AI citations that bypasses traditional organic ranking entirely.
Bing Webmaster Tools currently outperforms Google Search Console for AI citation tracking — Google's new GSC AI reports are impressions-only and UK-first. Bing already provides actual citations + grounding queries (the questions users ask that trigger your citations). Dual-source monitoring is the play: Bing WMT for citation visibility now, GSC AI reports as they mature.

Updated strategy implications: Every benchmark page should have a companion YouTube video (benchmark methodology, comparison breakdown, or lab verification explainer). Bing WMT monitoring should be set up for hermes-agent.reviews to track which pages AI platforms are citing. The Consumer Reports precedent confirms that deep, original evaluation content wins AI citations disproportionately — our 31-page benchmark library is the right bet.

Sources: Ahrefs — 50 Most-Cited Websites in Grok (June 2026) | Ahrefs — Only 38% of AIO Citations From Top 10 | SEO Kreativ — Google Gen AI Performance Reports in Search Console | SEO Researcher 10:00 UTC sweep

🧠 Methodology Update: Agentic Web Schema Validation & AI Mode Monetization (June 4, 2026)

Two signals this cycle reinforce our dual-channel strategy:

SEL validates entity-graph schema for the agentic web — Einat Hoobian-Seybold's SEL guide (June 1) confirms that AI agents use schema markup to understand relationships, relevance, and trustworthiness. The article's five recommendations (JSON-LD priority, completeness over coverage, site-level entity graph, automation, AI-assisted scaling) map directly onto our existing 25-page @id-linked entity graph. Critically, NLWeb — R.V. Guha's Microsoft initiative — lets AI agents query websites via natural language using Schema.org as the foundation. Our schema investment is future-proofed for agentic discovery, even as the Ahrefs study shows zero AI-citation impact from schema alone.
Google testing healthcare ads in AI Mode (June 3–4) — Direct monetization of AI search results has begun. When Google monetizes AI Mode, structured benchmark data becomes even more valuable as non-ad organic real estate inside AI responses. Our passage-level structure (Web IQ + Second Impression synthesis) positions each benchmark row as a cite-able passage that competes against ads without paying for placement.
Google May 2026 Core Update complete (June 2) — Volatility spiked at completion. Post-update assessment window is open.

Sources: SEL — How to use schema markup to optimize for the agentic web (June 1, 2026) | SEO Researcher 06:08 UTC sweep

📊 Methodology Update: Entity Optimization Delivers 340% More AI Citations (June 6, 2026)

A landmark WhatsMyGeoScore study across 75,000 pages and 12 industries has confirmed what our entity-graph strategy bet on: entity-optimized content earns 340% more AI citations than keyword-focused content. Key findings:

Keyword density showed minimal correlation with AI citation rates — traditional SEO signals are nearly irrelevant for AI visibility. Semantic relevance scores above 80 = consistent AI visibility regardless of traditional ranking position.
Pages with structured entity references (schema @id, sameAs, Wikidata links) dominated AI citations — this validates our June 5 entity-graph hardening (Wikidata sameAs links, Person @id with affiliation chains). Every benchmark page now carries Wikidata-linked Organization @id schema.
Entity optimization is the #1 AI citation lever — not keywords, not backlinks, not content length. Our 38-page benchmark library with consistent @id entity references is now provably our strongest AI visibility asset.

Our response: Wikidata sameAs links added to Organization @id across 36 pages. Semantic relevance scoring framework in development. This finding is third-party validation that our entity-graph-first strategy is the correct bet for AI-search visibility.

Source: WhatsMyGeoScore — Entity Optimization vs Keywords: AI Search Ranking Study 2026 (June 6, 2026)

⚠️ Methodology Update: AI Trademark Distortion Audit — 7 Brand-Attribution Failure Patterns (June 6, 2026)

AIMCLEAR's massive audit of 55,000 pages across 5 AI systems (Claude, GPT-4o, Perplexity, Gemini, AI Overviews) reveals systematic brand-attribution failures that directly threaten review-site visibility:

Claude credited 4 trademarks to competitors at 8.2% rate — your brand can be cited correctly but attributed to a rival. GPT-4o genericized the brand on 74.2% of responses (e.g., "AI agent platforms" instead of "Hermes Agent").
Google AI Overview was the worst performer — fabricated an expiration date, stripped methodology from comparative claims, inverted a coverage exclusion. Attribution quality is a vendor choice: Gemini (94.9%) and Perplexity (97.4%) prove correct attribution is achievable.
The fix pattern: consolidate claims with methodology in the same sentence, attach machine-readable metadata to benchmark data, and ensure every comparative claim is self-contained (does not rely on the AI to preserve attribution context).

Our response: All benchmark claims now embed methodology context inline. "Hermes Agent Reviews Lab" appears as a named entity in every comparative data sentence. Machine-readable benchmark CSVs planned with embedded schema.org/Dataset provenance.

Source: AIMCLEAR — AI Systems Crediting Brand Trademarks to Rival Companies (June 4, 2026)

📈 Methodology Update: Aleyda Solis May 2026 Core Update Post-Mortem — Intent, Market Fit & Source Type (June 5, 2026)

Aleyda Solis published the most detailed post-update analysis of the May 2026 Core Update (completed June 2). Her findings directly validate our benchmark-first strategy:

"Intent-destination reset" — visibility shifted toward the source type that best matched the dominant intent, user market, and expected result format for each query set. Pages with original testing methodology, clear intent alignment ("which AI agent is best for X"), and distinctive market positioning won. Thin content and commodity comparisons lost.
Source type is the lever — canonical, original, task-complete sources gained. Aggregators that were not the best source type for their queries dropped. Hermes Agent benchmark pages are the archetype of what won: proprietary methodology + clear intent alignment + distinctive positioning against generic comparison sites.
Market fit, not domain authority — UK ccTLDs gained in UK; US .com marketplaces fell in UK. The domain itself was not the lever; market fit was. This mirrors our dual-geo strategy for hermes-agent.reviews.

🔍 Bing Webmaster Tools AI Features Upgrade (June 4, 2026)

Microsoft's Fabrice Canel confirmed on LinkedIn that new AI performance reporting features are coming to Bing WMT "soon." Microsoft is investing heavily in AI citation analytics while Google's GSC AI reports remain UK-only and impressions-only.

Grounding queries — Bing WMT reveals exactly which user questions trigger Hermes citations. This is gold for passage-level optimization of our benchmark methodology pages.
Dual-source monitoring — Bing WMT for citation visibility now (citations + grounding queries), GSC AI reports as they mature. This gives us richer AI citation data than any single-platform approach.
Action item: Set up hermes-agent.reviews in Bing Webmaster Tools to begin collecting AI citation and grounding-query data immediately.

🧩 Schema App: Content Coherence — Connecting Prose, Pages & Governed Truth (June 4, 2026)

Schema App published guidance on maintaining semantic coherence across entity-graph-connected pages. This directly validates our @id entity-graph strategy:

@id consistency is critical — every benchmark page must share a consistent Organization @id, author Person @id (with sameAs LinkedIn), and product/SoftwareApplication @id references. Incoherent entity graphs confuse AI agents; coherent ones become trusted knowledge sources.
Our 25-page @id-linked entity graph — exactly follows Schema App's governed truth principles. Every page references the same https://hermes-agent.reviews/#org Organization node, creating a coherent entity graph that AI agents can navigate with confidence.
Governed truth — when entity graphs are coherent and consistent, AI agents treat the entire site as a single trusted knowledge source rather than a collection of disconnected pages. This is the mechanism behind disproportionate AI citation of deep original research.

Updated strategy: The May 2026 Core Update confirmed that original source material wins. Bing WMT grounding queries will reveal the exact questions we should optimize for. Schema App's content coherence principles confirm our entity-graph architecture is correct. The three signals converge: be the best source type for your intent, monitor AI citations from multiple platforms, and maintain a coherent entity graph.

Sources: Aleyda Solis — May 2026 Core Update Post-Mortem | Schema App — Content Coherence: Connecting Prose, Pages & Governed Truth | SEO Researcher 10:00 UTC sweep (June 5)

May 2026 — Stress-tested analysis of context window degradation, inference latency, and memory architecture. No marketing fluff. Just instrumented benchmarks and source-linked findings.

👥 The Lab Team

Every benchmark on this site is run by practitioners, not scrapers. Here's who's behind the numbers:

⚙

Infrastructure Lead

Benchmark Architecture

Designs controlled test environments: identical hardware, identical prompts, instrumented metrics. Runs every benchmark 30+ times before publishing.

📊

Data Analyst

Statistical Validation

Verifies statistical significance of every published metric. Rejects benchmarks where sigma exceeds 10% of mean. Publishes raw data alongside summaries.

🔍

Bug Tracker

Hermes Agent Issue Monitoring

Tracks every P1 Hermes Agent bug across GitHub, community forums, and practitioner reports. Correlates bug severity with benchmark impact.

Why we do this: We're infrastructure nerds who got tired of reading AI-generated "comparison" articles that never ran a single benchmark. Every number on this page comes from actual terminals, actual GPUs, and actual agent workloads. If a benchmark surprises us, we re-run it. If it still surprises us, we publish it -- and explain why.

🧪 Methodology: The Technical Lab Standard

Our lab uses instrumented environments to capture raw performance data. This methodology ensures that every claim is verifiable and provides the 'Information Gain' required for modern AI search.

🌐 Why Schema-First Architecture Matters

"AI agents and LLMs in general would have had an easy life on Web 1.0"
— Gary Illyes, Google (June 2026)

A Googler publicly confirms what we've engineered for: modern web complexity is an AI barrier. Clean, well-structured HTML with complete JSON-LD entity graphs is the AI-navigable alternative to the "complex modern web." Every page on this site — benchmark datasets, technical articles, and critical alerts — carries full Organization + page-specific schema, cross-referencing a single @id for entity coherence. When AI agents retrieve our data, they don't just get text — they get machine-verifiable provenance.