The Quality Paradox: Why Search Improved for Google While It Degraded for Publishers
Three layers of evidence (publisher traffic decay, Alphabet's revenue mix, the leaked Google ranking schema) showing why no algorithmic recovery is coming.
From inside a digital agency I see dozens of Google Analytics dashboards. Clients across niches, partners I've worked with for years, friends in publishing who let me peek at their numbers. The decline pattern in 2024 through 2026 is specific in a way that doesn't match Google's stated guidance about what helpful content gets rewarded. Some sites lost ninety percent of their organic traffic. Others held flat or grew. The split tracks a structural feature of the content, not the quality of the writing, and a structural feature of Google's revenue model that nobody seems to want to name out loud.
So the goal here isn't to argue that SEO is dead. That phrase doesn't survive contact with the data, and it bores me anyway. What I want to argue is something more specific: between 2023 and 2026 Google's Search business improved (revenue up, AI features named by management as a driver of usage growth, share of Alphabet only marginally compressed) while the publisher tier that historically depended on Search degraded along measurable axes. The two trends are tightly correlated. The product Google credits for accelerating Search is the same product whose CTR data shows publishers losing their clicks. That part is the strange one. Google said it on the earnings call.
Three layers, in order. First the empirical pattern: a decay measurement across 27 mid-tier publisher properties in six niches, 2022 through 2026, using Ahrefs organic-traffic estimates. Second the financial structure: a 12-quarter decomposition of Alphabet's reported revenue mix from primary earnings-release sources, looking at what changed for Google as the publisher tier shrank. Third the algorithmic mechanism: the March 2024 Google Content Warehouse leak, analyzed by Mike King at iPullRank and Rand Fishkin at SparkToro, which surfaces the site-level proxy signals that bias ranking against single-author and small-operation publishers by construction.
The reason nobody in the industry wants to make these three claims in the same place is that together they imply a conclusion uncomfortable for both halves of the SEO discourse. The "Google is hostile to creators" half is wrong: Google didn't kill publishers in any active sense. The "Google rewards quality" half is also wrong: the proxies that drive ranking don't measure quality. Both halves orbit a fact neither says aloud, which is that Google no longer needs the publisher tier for Search to work. The data points there from three independent angles.
What 27 publisher sites tell you
I started with a question and a list. The question: did the publisher collapse since 2022 happen uniformly across content types, or is there structure to it? The list: 27 mid-tier publisher properties across six niches, the kind that grew up in the 2013–2018 window when the SEO/AdSense/evergreen formula still worked. Travel: Nomadic Matt, The Blonde Abroad, Expert Vagabond, Never Ending Footsteps, Adventurous Kate, The World Pursuit. Recipe: Smitten Kitchen, Pinch of Yum, Minimalist Baker, Half-Baked Harvest, Sally's Baking Addiction. Tech how-to: How-To Geek, MakeUseOf, Instructables, Beebom, AddictiveTips. Hobby and craft: Allfreesewing, Crochetkim. Personal finance: Mr. Money Mustache, Wise Bread, Get Rich Slowly, Frugalwoods, Budgets Are Sexy. Lifestyle: Apartment Therapy, Cup of Jo, Design Mom, Alpha Mom.
Some of these are single-author operations; others are part of digital media portfolios. How-To Geek and MakeUseOf are owned by Valnet (acquired 2023 and 2020 respectively), Instructables by Autodesk (since 2011), Allfreesewing is part of Prime Publishing. I deliberately excluded large media conglomerates (Condé Nast, Hearst, Recurrent Ventures). The sample is mid-tier publisher properties: bigger than personal blogs, smaller than enterprise media. That tier is where Google's stated helpful-content criteria should most directly apply.
For each site I pulled monthly organic-traffic estimates from Ahrefs, January 2022 through April 2026. 1,404 monthly observations across 27 domains. Peak month is defined as the month with the highest twelve-month trailing average traffic, restricted to before August 2024 so that post-AI-Overviews recovery couldn't be confused with the actual peak. Current is the trailing three months (February through April 2026) to smooth out single-month index artifacts. Inflection month is the first time after peak where the rolling three-month mean drops fifteen percent or more and the decline holds for the next three months. That catches the cliff trigger, not the floor of the slide.
Ahrefs is an estimate. It is not ground truth. The estimates are derived from a keyword corpus and a click model, and when Ahrefs adds new keywords or recalibrates the model, all historical estimates can shift. I'll flag specific data-quality concerns where they matter. The relative comparison across sites and the relative position of inflection months are more reliable than any single absolute number.
The headline finding is that the collapse is bimodal, not uniform. Of the 27 sites:
- 5 are catastrophic: lost 90 percent or more of peak traffic (How-To Geek, AddictiveTips, MakeUseOf, Expert Vagabond, The World Pursuit)
- 7 lost 50 to 89 percent (Mr. Money Mustache, Frugalwoods, Wise Bread, Nomadic Matt, Minimalist Baker, Crochetkim, Allfreesewing)
- 8 lost 1 to 49 percent (the moderate middle)
- 7 grew or held flat (Never Ending Footsteps +188%, Beebom +168%, Adventurous Kate +72%, Budgets Are Sexy +45%, Instructables +23%, Half-Baked Harvest +19%, The Blonde Abroad +7%)
Median decline across the whole sample is −31 percent. Mean is −20 percent. These are not "publisher apocalypse" numbers; they are bifurcation numbers. Some sites died completely. Many sites declined modestly. A meaningful minority grew through the same window.
The niche breakdown sharpens the bifurcation:
| Niche | Sites | Median decline |
|---|---|---|
| Recipe | 5 | −10% |
| Travel | 6 | −27% (wide range) |
| Lifestyle | 4 | −30% |
| Personal finance | 5 | −53% |
| Hobby and craft | 2 | −79% |
| Tech how-to | 5 | −93% |
Recipe sites largely survived. Tech how-to sites were gutted. That's an 83-point spread between the niche that did best and the niche that did worst, on the same algorithm, in the same window. Lifestyle held; how-to and craft collapsed; travel and personal finance split.
Now look at when the cliffs happened. Across 23 detected inflection months (four sites declined gradually with no clean inflection trigger), the distribution is:
- August through December 2022 (HCU first launch period): 2 inflections
- September through December 2023 (HCU September 2023 + October and November core updates): 6 inflections
- January through April 2024 (March 2024 core update): 1 inflection
- May through August 2024 (AI Overviews US launch May 14): 1 inflection
- Q4 2024 onward (post-AIO drift): 11 inflections
The single most destructive event in this dataset was the September 2023 HCU plus the back-to-back October and November core updates. Six of twenty-three detected inflections cluster in that twelve-week window. The World Pursuit (October 2023), MakeUseOf (October 2023), AddictiveTips (November 2023), Crochetkim (November 2023), Allfreesewing (December 2023), and The Blonde Abroad (December 2023) all began their declines during this period.
The August 2022 HCU, the one with the brand-name reputation, matters less than its reputation suggests. Only two inflections fall in the August–December 2022 window. The original HCU was real but mild compared to what came thirteen months later.
The AI Overviews US launch (May 14, 2024) is barely visible as a discrete trigger. One inflection in the immediate post-launch window. AIO matters intensely for click-through rates on the queries it covers, Ahrefs measured a 34.5 percent drop in CTR for the top-ranking page when an AIO is present (Ahrefs, April 2025; an updated Ahrefs analysis from February 2026 using newer GSC data put the figure at 58 percent at position 1), but it does not show up as a sharp inflection in traffic estimates the way HCU does. The 11 post-AIO inflections through Q4 2024 and 2025 are slow drift, not cliffs.
So the data says the 2023 HCU was the cliff. AIO is the drift. The named 2022 HCU and the named May 2024 AIO are both real but neither is the largest single trigger in this dataset.
Two caveats worth honest disclosure. The World Pursuit lost 99.8 percent of its traffic, but a live check on the site shows the owners stopped publishing in February 2024. The cliff in October 2023 was real algorithm impact, but the subsequent slide into single-digit visit territory was abandonment, not just penalty. Mr. Money Mustache shows the cleanest example of "post-AIO drift turning into a discrete cliff": a gradual climb through early 2025 followed by a sharp drop starting August 2025. The timing is suspicious enough to warrant a second-source check before treating it as pure algorithm impact. Beebom's growth (+168 percent) is the only catastrophic-niche outlier. The jump from roughly 3M to 10M visits in three months reads more like an Ahrefs model recalibration than real overnight traffic doubling.
Removing all three suspect data points doesn't change the bimodal pattern. The relative-decline order across niches is robust, and the cliff timing in late 2023 holds.
Here is where the data becomes mechanism-revealing. What got destroyed is content where the query has a summarizable answer. How-to articles ("how to convert a video file", "how to enable dark mode"), tech tips, definitions, factual lookups. The answer fits in three sentences. AI Overviews can deliver it directly in the SERP, and the user doesn't need to click through.
What survived is content where the page itself is the destination. A recipe is the page. You go to Smitten Kitchen because you're going to cook the cookies. AI Overviews can summarize the recipe in the SERP, but the user still needs the full ingredient list and the timing and the picture of the dough and the comments below from people who substituted brown butter. Lifestyle articles are read for the writer's voice and detail; they don't compress without losing the thing being consumed. Personal essays. Long-form analysis. Photographic content. The page is what you came for.
This is not a retrofit explanation. It's a falsifiable claim: AI Overviews should disproportionately damage content whose typical query has a summarizable answer in three sentences or less. Recipe pages should survive because the recipe is not the answer to the query, the recipe is what you do with the answer. How-to pages should die because the how-to is the answer. The data, niche by niche, matches the prediction. Recipe (−10%): survived. Lifestyle (−30%): survived. Tech how-to (−93%): destroyed. Hobby craft (−79%): destroyed. Travel (mixed −27%): split. Informational travel queries are summarizable; experiential travel content is not; the niche shows both extremes. Personal finance (mixed −53%): split for the same reason. Calculator-type queries are summarizable; personal-philosophy essays are not.
The mechanism that makes this prediction work shows up in two places. The algorithm itself, which we'll get to in section four. And Google's own revenue model, which is next.
Why Google's incentive to fix this is gone
The standard narrative around publisher decline contains an implicit hope: at some point, complaints will accumulate, the algorithm will be corrected, traffic will route back to good sites. Google has revised named systems before. The August 2022 HCU was retired as a standalone named system in March 2024 and incorporated into the core ranking system. The mental model assumes Google has an incentive to keep the publisher ecosystem alive because Search needs publisher pages to be the destination users click toward.
That mental model worked when Search was the dominant share of Alphabet's revenue and growth. It doesn't anymore. To see why, the cleanest place to look is Alphabet's quarterly reported revenue mix from primary sources, the earnings releases that 10-Q filings rest on.
I pulled the last 12 reported quarters, Q2 2023 through Q1 2026 (the most recent quarter as of this writing, released April 29 2026). Every number below comes from the Alphabet investor-relations CDN where they post each quarter's release as a PDF, cross-checked against the comparative column in the following quarter's release. No analyst estimates, no third-party rollups.
Three things stand out.
First, Cloud's share of Alphabet roughly doubled. Google Cloud went from 10.8 percent of consolidated revenue in Q2 2023 to 18.2 percent in Q1 2026. Google Search & other slipped from 57.1 percent to 55.0 percent over the same window. Cloud absorbed roughly seven and a half percentage points of revenue mix in three years. The other segments held their relative shares within a couple of points, Cloud is the segment that grew its slice of the pie.
Second, in absolute dollars Cloud is the marginal-growth engine even though Search is still the larger absolute contributor. Over the window Cloud added $12.0 billion in quarterly revenue (from $8.03B to $20.03B, +149 percent). Search added $17.8 billion in quarterly revenue (from $42.63B to $60.40B, +42 percent). Search added more absolute dollars off a much larger base, which is the nuance the article should not bury. But Cloud's YoY growth rate accelerated through the window (+28.9% YoY in Q2 2024 climbing to +63.4% YoY in Q1 2026) while Search's growth rate hovered between +9.8% (trough) and +19.1% (most recent quarter). Cloud is growing more than three times faster than Search.
Third, and this is the part that matters for publishers, Search-line revenue keeps growing while publisher organic clicks decline. The CTR studies say that for queries with an AI Overview, the top-ranking page gets 34.5 percent fewer clicks than equivalent queries without one (Ahrefs again, the February 2026 update puts the figure higher at 58 percent). The Authoritas study of UK news publishers found per-query CTR loss of almost 50 percent (Press Gazette, 2025). Stack Overflow's daily traffic fell about 12 percent post-ChatGPT (Burtch, Lee & Chen, 2024, Scientific Reports), and new-question volume fell about 75 percent from its 2017 peak based on the public Stack Exchange data dump (Holscher, 2025). The trend lines point the same direction: clicks are leaving the publisher tier.
But Google's Search-line revenue grew through the same period. Q1 2026: +19.1 percent year-over-year. The composition of that growth, what's driving it, is not fully in the press release segment table. Pichai's exact press-release line is that "Search had a strong quarter with AI experiences driving usage, queries at an all-time high, and 19% revenue growth." On the accompanying earnings call leadership extended that, naming AI Overviews and AI Mode specifically as usage drivers, alongside vertical strength in retail and finance as separate revenue contributors. Nobody at Alphabet has quantified how much of the reacceleration is attributable to AI features versus the other drivers. But they have named those features as material, and that is enough for the argument: the product that takes click-through rate from the publisher tier is the same product Google credits with growing Search.
Read for the publisher this is a clean causal arrow. AI Overviews extracts value from publisher content (it is trained on, indexes, and surfaces what publishers wrote) without routing the user to the publisher. The user gets the answer, stays on Google's surface, sees Google's ads, and never visits the site that originated the information. Alphabet doesn't report per-query monetization in its segment table, so "per-query monetization rose for Google" is inference from the combination of growing Search revenue and the CTR studies above. The inference is straightforward: same product mechanism, opposite consequences for publisher and platform.
Read for Google this is rational allocation. When Search required publisher pages to be the destination users clicked toward, publisher health was Google's problem. The flywheel only spun if there was good content for the link to point at. Now the destination is the SERP itself, populated with AIO summaries, the Discussions box pulling from Reddit (the $60M-per-year Reddit-Google data licensing deal reported by Reuters in February 2024 was not coincidental), the Knowledge Panel, the People Also Ask, the embedded video clips. Publishers became inputs to a destination Google now owns, rather than destinations Google referred users to. Search didn't degrade for Google. Search got upgraded to extract value at the SERP layer rather than route it through.
This is what changed. Not "Google killed publishers", that frame is too active. The frame is that Google rebuilt the search product so that the publisher tier became an upstream supplier of content rather than the downstream destination of clicks. Suppliers can be replaced, recombined, or removed when their substitutes are good enough. Destinations could not be. The shift from destination to supplier is what makes the publisher position structurally weaker, and it's what makes the algorithm changes self-reinforcing rather than self-correcting.
The standard response to this argument is "Search is still the bigger absolute business, look at the dollars". True. The data does not say Search is collapsing or dying. The data says Search is healthy for Google. It also says Search is growing slower than Cloud, that Cloud is roughly tripling its growth rate relative to Search, and (importantly for the publisher question) that Search is healthy despite sending fewer clicks to its underlying content tier. Health for Google and health for publishers are no longer the same thing. They were aligned when the flywheel needed publisher pages to spin. They are decoupled now that the flywheel spins inside the SERP.
One useful asymmetry to notice. Cloud's reported backlog "nearly doubled quarter on quarter to over $460 billion" per Alphabet's Q1 2026 earnings release. Annualizing the Q1 2026 Cloud revenue ($20.03B × 4 = ~$80B), $460B of backlog is roughly 5.7x that figure (though "backlog" here is contracted future revenue, RPO, not the same as a committed-pipeline number you'd see at a startup). The strategic attention inside Alphabet (capex allocation, hiring priority, executive promotions, board-level discussion) follows backlog and growth rate, not absolute revenue level. Search is being run for cash. Cloud is being run for growth. The internal posture toward publisher relationships necessarily reflects where the company sees its forward trajectory, not where it sees its current revenue concentration.
This explains, mechanically, why no algorithm rollback is coming. The publisher-tier degradation isn't an unfortunate side effect of an updated ranking system Google would prefer not to ship. It's the predictable outcome of moving the destination layer onto Google's surface, which is exactly what AIO and AI Mode are. To roll the algorithm back to the previous equilibrium would require unshipping the product line that explains Search's revenue reacceleration. That doesn't happen.
Why the algorithm has the bias it has, by construction
Section 3 showed Google's incentive structure has shifted: Search can grow while publisher referrals decline because per-query monetization moved onto the SERP. This section shows the algorithmic mechanism that makes the shift permanent.
There are two mechanisms operating in parallel, and the data in section 2 separated them for us without our knowing it.
Mechanism A is content summarizability. The how-to queries die because the answer is the entire content, three sentences and the user is done. The recipe queries survive because the recipe is what you do with the answer, not the answer itself. Mechanism A explains which content TYPES survive at all.
Mechanism A operates through two surfaces with different timing. The 2023 cliffs in the data above predate AI Overviews by eight months. The September 2023 HCU and the back-to-back October/November core updates directly demoted content the algorithm classified as unhelpful. Empirically, the content the helpful-content classifier downranked was the same kind of content that's summarizable in three sentences: templated how-tos, thin reference posts, evergreen one-paragraph definitions. Then in May 2024 AI Overviews launched and started suppressing clicks on those same query types from a different surface. Instead of demoting the result, it surfaces the answer in the SERP so users don't click through. Same content category targeted, two different mechanisms, two different timing windows. The 2024-2025 drift in the data is the AIO half; the 2023 cliffs are the HCU half.
Mechanism B is site-level proxy signals. Google's ranking system computes site-wide trust scores from observable correlates of operational infrastructure: backlinks, Chrome clickstream volume, brand-search velocity, sustained mention density in news corpora, schema markup completeness. These signals scale with enterprise content operations, not with first-hand experience. Mechanism B explains who WINS among the content types that survived.
Together they answer the puzzle that single-mechanism explanations cannot. A signal-catalog-only explanation cannot account for Smitten Kitchen (recipe, single-voice independent, −3.5%) surviving alongside How-To Geek (tech how-to, larger operational footprint, −93%). A summarizability-only explanation cannot account for Apartment Therapy (lifestyle, larger brand operation, −39%) outperforming Frugalwoods (personal finance, single-author, −60%) within the niches that didn't get categorically destroyed. Mechanism A determines which content TYPES the algorithm and AIO together remove from the click economy; Mechanism B determines which sites win among the survivors. The bimodal data is what you'd expect if both mechanisms are operating.
Mechanism A is observable from outside Google. AI Overviews are public; the queries they appear on are public; the citations they include are public; the click-through rates have been measured by Ahrefs, Authoritas, and others. No insider information needed.
Mechanism B is exactly where the Google leak matters.
The leaked signals
In March through May 2024, Erfan Azimi released approximately 14,014 attributes from Google's internal Content Warehouse API documentation. Rand Fishkin at SparkToro and Mike King at iPullRank published their analyses on May 27, 2024, the same day. Google confirmed authenticity through a statement to The Verge (also reported by Search Engine Land) that didn't deny the schema but qualified that some attributes might be "out-of-context, outdated, or incomplete."
The subset of attributes most relevant to site-level quality and trust, the layer that appears to shape ranking before any page-specific relevance work, is small and consistent. The leak shows attribute names and types in API documentation, not active weights or current deployment status, so treat the catalog below as the framework the system is built on, not as proof of any specific current weighting. Mike King's analysis is the most rigorous walk-through. Here are the load-bearing signals.
| Signal | What it appears to measure | Why it biases against independents | Status |
|---|---|---|---|
siteAuthority | Site-wide authority score | A site-wide multiplier, no individual post can outrun a low site score | CONFIRMED. Contradicts Google's repeated public denial of a domain authority metric. |
chromeInTotal | Site-level Chrome clickstream volume | Independent sites without distribution have near-zero baseline; signal compounds with audience, not content quality | CONFIRMED. Contradicts John Mueller's repeated public position that the only Chrome data Google uses for ranking is CrUX page-experience aggregates, not site-level clickstream volume. |
hostNsr | Host-level normalized site rank | A site whose chunks read as a single hobbyist voice gets one score for the whole host | CONFIRMED |
siteFocusScore | How topically focused the site is | Personal sites mixing travel + code + essays read as unfocused by construction | CONFIRMED |
siteRadius | How far page embeddings deviate from the site embedding | A first-hand essay outside the site's topical centroid is structurally penalized regardless of quality | CONFIRMED |
smallPersonalSite | Flag for small personal sites | Direction unspecified in leak; existence proves Google maintains a separate code path keyed on "is this a hobby site" | CONFIRMED (existence); DISPUTED (direction) |
Two omissions worth flagging, with the standing caveat that the leak is incomplete by Google's own admission, so absence of a literal field name doesn't prove absence of a corresponding mechanism. No firstHandExperience attribute appears in the leaked surface. No authorWasActuallyThere attribute either. The site-level signals above are what does appear. "Experience" as a concept lives prominently in Google's public-facing E-E-A-T guidelines and Search Liaison messaging, but it doesn't surface as a queryable feature in the leaked attribute schema with anything like the prominence the public messaging would suggest.
Why the schema looks this way
Google cannot verify factual quality at web scale. No service inside Search calls an LLM to ask whether the author of "What it's like to spend a month in Pattaya" actually spent a month in Pattaya. Even at current inference prices that check is uneconomic across the indexed web, and it would still be wrong half the time because the model has no ground truth either. The verification problem is structural, not a tooling gap that better models close.
So Google does what any large-scale ranker does when the thing it wants to measure is unmeasurable: it picks proxies. Observable correlates of the thing. Signals that move together with quality, on average, across a large corpus, even when any single instance is noisy. This is not a scandal. It is the only design that exists at this scale.
The question is which proxies. And here the leak is precise where Google's public messaging is vague. The proxies that drive site-level ranking are operational signatures: siteAuthority derived from Qstar, chromeInTotal derived from browser clickstream volume, hostNsr derived from sitechunk aggregates, siteFocusScore and siteRadius derived from embedding distances. Layered on top of that is the well-documented entity infrastructure: Knowledge Graph linkage, branded-search volume for navigational queries, schema markup, sustained mention velocity in news corpora. The full stack's required inputs all scale with operational infrastructure, not with whether the author was in the room.
Read the bullish way: these proxies correlate with trustworthy publishers because trustworthy publishers tend to build operational infrastructure. A site that has run for a decade, gets cited in news, holds Knowledge Graph linkage, and shows steady branded search has, on average, earned that footprint by being reliable. The proxy is doing its job.
Read the bearish way: the proxies don't measure trust. They measure the operational signature of an entity that has the staff, time, and budget to look like a brand. A small consultancy with a PR retainer, a domain registered as a business, consistent NAP across directories, a paid Knowledge Graph push, and a content team producing tightly-clustered topical material gets the same site-level score as a publication that actually fact-checks. Both readings are correct, and which one matters depends on whether you're asking about average quality or marginal quality.
The marginal case is where the contradiction lives. A solo author with first-hand domain experience (a working trader writing about a market they live in, a developer writing about a tool they built, a parent writing about a school system their kids attend) produces, by definition, the kind of content E-E-A-T was supposed to reward. The same author cannot economically generate the operational signature the ranker actually consumes. They have no PR budget to seed news mentions. Their NAP is one name across one personal domain. Branded search for their name returns near-zero navigational queries because nobody has heard of them yet. siteFocusScore reads as low because they write across the topics their life actually intersects. chromeInTotal is small because they have no distribution. hostNsr averages low because the sitechunks span first-hand essays, side-project documentation, and an old talk. Every one of these signals codes "small personal site" as "weak site" through the same proxy mechanism. The smallPersonalSite flag itself is a separate question. Its existence proves Google maintains a code path keyed on hobby-site classification, but the leak doesn't tell us whether that path boosts or demotes; some published readings actually interpret it as a small-site promotion signal rather than a penalty.
Building the operational signature isn't a content problem. It's an enterprise-procurement problem. Knowledge Graph entries, sustained PR coverage, schema TravelAgency or schema NewsArticle, branded navigational query volume: these are line items in a marketing budget, not byproducts of writing well. The author who knows the most about Pattaya cannot rank against the affiliate site that has never been there, because the affiliate site has the procurement budget and the author doesn't.
The two mechanisms together
Now go back to the bimodal data with both mechanisms in hand.
Recipe sites survived (Mechanism A) AND tend to have strong operational signatures (Mechanism B). Smitten Kitchen has sixteen years of compounding distribution, sustained mention velocity in food media, recurring direct traffic, Pinterest distribution. The combination of "AIO can't summarize the cooking experience" and "Smitten Kitchen has the infrastructure to win among survivors" yields the −3.5 percent decline we observed.
How-To Geek lost to Mechanism A first, the queries it served are exactly the queries the September 2023 HCU downranked and AIO later started answering in three sentences, but its operational signature wasn't enough to win the much smaller market that survived. Hence the −93 percent.
Mr. Money Mustache survived Mechanism A for years, since personal-finance philosophy isn't summarizable, but lost slowly to Mechanism B (one author, narrowing focus, no enterprise content operation). His decay shows up as gradual drift through 2024 and early 2025 followed by a discrete cliff in August 2025, the shape you'd expect from a core update reweighting site-level signals landing asymmetrically on lone-author sites, with the caveat from Section 2 that the August 2025 timing is also suspicious enough to warrant a second-source check before treating it as pure algorithm impact.
The World Pursuit lost to both mechanisms in October 2023 and stopped publishing in February 2024, which exposes a third dynamic: when the algorithm crosses the breakeven threshold for a small site's economics, the site stops producing content, which then makes the algorithm's verdict self-confirming. The signal smallPersonalSite was, in this case, a leading indicator of a site that would stop existing.
This is the architecture that makes the publisher position structurally weaker than it was. Mechanism A removed an entire content category from the click economy. Mechanism B ensured that the remaining content category went to operational entities rather than individual experts. Section 3 showed Google has no financial incentive to reverse either mechanism. The next section shows what's forming in the space left over.
The licensing market forming in the rubble
Sections 2 through 4 are about what's broken. This section is about what's being built.
The thesis: while the publisher ecosystem decays on the ad-supported click economy, a new market is forming on the AI-content-licensing model. The components are visible. They have not been assembled into one story.
In December 2023 Axel Springer signed a multi-year deal with OpenAI covering POLITICO, Business Insider, BILD, and WELT; secondary reporting from Axios put the terms at three years and tens of millions of euros (the official announcement didn't disclose figures). The same month NYT filed its lawsuit against OpenAI and Microsoft seeking billions in damages and the destruction of training datasets containing NYT content. Two months later, in February 2024, Reuters reported that Reddit had signed an AI-training data licensing arrangement with Google for approximately $60 million per year. In May 2024 News Corp signed its own deal with OpenAI, valued by WSJ at more than $250 million over five years, covering WSJ, Barron's, MarketWatch, NY Post, The Times (UK), The Sun, The Australian. Two months after that, on July 30 2024, Perplexity launched its Publishers Program with six founding partners (TIME, Der Spiegel, Fortune, Entrepreneur, Texas Tribune, WordPress.com), structured as a revenue share whenever Perplexity earns money from interactions referencing partner content.
Then on July 1, 2025, Cloudflare made the most structural move of any of them. They flipped the default on new domains to block AI crawlers, and launched a pay-per-crawl market priced via HTTP 402. A crawler hits a publisher's site, receives 402 Payment Required, retries with a crawler-exact-price header acknowledging the publisher's stated price, and the request is authenticated cryptographically via RFC 9421. As Matthew Prince put it in the launch post, "Cloudflare, along with a majority of the world's leading publishers and AI companies, is changing the default to block AI crawlers unless they pay creators for their content."
Stand back from the individual deals and the shape is clear. Three structural mechanisms are forming simultaneously:
- Enterprise licensing at the top of the market, News Corp, Axel Springer, AP-class deals where AI companies pay seven- and eight-figure annual sums for blanket training and serving rights.
- Revenue share in the middle, Perplexity's program where publishers receive a cut of revenue from interactions citing their content.
- Infrastructure metering at the long tail, Cloudflare's HTTP 402 layer turning every individual crawler request into a priced transaction.
These are not competing models. They are three pricing tiers for the same underlying market: human-written signal sold to LLM providers as training data and as RAG retrieval source. The enterprise tier serves the publishers who can negotiate. The revenue-share tier serves the middle. The metering tier handles everyone else.
Why this market must exist
The case for inevitability rests on a single piece of empirical research that engineering readers should know specifically. In May 2023 Shumailov, Shumaylov, Zhao, Gal, Papernot and Anderson posted The Curse of Recursion: Training on Generated Data Makes Models Forget to arXiv. In July 2024 the same team published the peer-reviewed extension in Nature under the title AI models collapse when trained on recursively generated data. The mechanism they describe, model collapse, is that successive generations of model trained on data produced by previous generations of model lose statistical fidelity. The tails of the distribution vanish. Outputs converge toward the mode of an increasingly impoverished generative distribution.
The Shumailov result establishes that LLM providers face permanent demand for high-quality human or human-curated signal, purely synthetic training loops degrade. It does not prove the demand can only be met through licensing; curated mixtures of synthetic and human data, distillation from larger models, and filtered web crawls all reduce the collapse problem in practice. What it does mean is that ongoing access to fresh human signal is a structural input rather than an optional one. Combine that with Cloudflare-style metering pricing previously-free crawler access, the open web filling with AI-generated content that fails the freshness test, and the legal exposure the NYT lawsuit is establishing on the unilateral-scraping side, and the cost of meeting that demand through unmonetized scraping starts rising relative to the cost of licensing it.
So a market is forming. The question has never been "will publishers get paid for content licensing?" The question is what shape the payments take and at what scale.
Engaging the strongest counter
The most rigorous published counter to the licensing-market thesis is Nieman Lab's December 2025 piece arguing there is no meaningful licensing revenue for most publishers. The argument is both empirical (the 2025 numbers are small) and structural (AI firms have stronger bargaining power than publishers and may never need to pay most of them, even if a market exists). The empirical side is correct as of late 2025; license revenue for the median publisher is small, and Cloudflare's pay-per-crawl was six months old when Nieman published.
The structural side is the load-bearing one. Nieman's reading is that bargaining power favors AI providers permanently: they can substitute across publishers, train on what's already scraped, operate with thinner data over time, and pay only the few largest publishers who can credibly withhold access. My reading is that bargaining power is exactly the relevant question, and infrastructure like Cloudflare's per-domain metering changes it by removing the substitute-by-scraping option for any publisher behind the network. Whether that shift in the bargaining environment translates to revenue flowing past the top tier is the actual open question, and it's empirical for the 2026-2028 window. The relevant analogy for the "is it forming?" half of the question is infrastructure-market scale-up: Stripe took several years from founding to meaningful payments volume, Visa took two decades. An 18-month-old pay-per-crawl market not having scaled by late 2025 is consistent with new infrastructure markets, not evidence the structure won't form.
The substantive risk Nieman flags is real but different from the existential one: the licensing market may form in a shape that concentrates revenue at the top tier without flowing down to mid-tier or independent publishers. That's a distribution problem, not an existence problem. Cloudflare's per-domain pricing tries to address it on the infrastructure side. Revenue-share programs like Perplexity's try to address it in the middle. Whether these mechanisms scale enough to meaningfully compensate independent publishers is the actual open question, and it is one the 2026–2028 window will answer empirically.
What can be said with the data we have: the destination layer Google built on the SERP is now structurally extracting value from publisher content without compensation, and the AI-licensing infrastructure being built in parallel is the first mechanism that monetizes content back to the source even partially. Publishers who survive 2026–2028 will be the ones who end up on the supply side of this market with the operational capacity to negotiate or invoice. Publishers who don't survive will exit through the same door (Mechanism A and Mechanism B from Section 4) that closed off the click economy.
What this means structurally
Three concurrent shifts are documented in the data above. They are not the same shift. They reinforce each other but they have different mechanisms and different time horizons.
Algorithmically (Section 4): the site-level proxy signals that drive ranking systematically advantage operational infrastructure over first-hand content. The smallPersonalSite flag exists in the leaked schema (direction disputed). No firstHandExperience attribute appears in the leaked surface, though the leak is incomplete by Google's own admission. By construction, single-author and small-operation publishers compete against operational entities on the metrics the algorithm appears to measure.
Economically (Section 3): Google's Search-line revenue grew through the same window that publisher click-through rates collapsed. Search revenue rose while clicks routed to publishers fell, implying per-query monetization rose at Google's end. The financial incentive Google had to maintain the publisher ecosystem (when publisher pages were the destination users clicked toward) was rebuilt away when AIO and AI Mode moved the destination onto the SERP. Search is healthy for Google. Publisher dependence on Search is the inverse.
Infrastructurally (Section 5): the AI-content-licensing market is forming in three tiers: enterprise deals, revenue share, infrastructure metering. The Shumailov et al. result establishes that LLM providers have a permanent demand for fresh human-written signal. The 2026–2028 window will determine the shape and scale of the market that emerges, with the distribution problem (does revenue flow past the top tier) as the substantive open question.
What's permanently lost is the 2013-era equilibrium where an anonymous domain could publish evergreen reference content, accumulate links over time, monetize via AdSense, and operate as a stable business at modest scale. AI Overviews removed the queries the content category served. Site-level proxy signals removed the ranking advantage anonymous independent sites might have had against operational competitors. The financial incentive that would have led Google to protect that equilibrium has been rebuilt around a destination layer Google now owns. None of these reverse.
What's still functional is content where engagement cannot be summarized, where the page is what you came for. Recipes you cook. Lifestyle you read. Long-form analysis. Photographic content. Personal philosophy. The categories that survived in the section 2 data are the categories whose underlying queries are not three-sentence-summarizable, and whose readers want the destination and not just the answer.
What's emerging is content licensing to AI providers as a primary revenue substrate, with the per-page web visit becoming residual. The publishers who navigate the 2026–2028 window are the publishers who end up on the supply side of this market with the operational capacity to invoice. The publishers who don't are the publishers who exit through Mechanism A and Mechanism B.
The blog as a product survives only where engagement cannot be summarized. The blog as a business survives only where it has the operational infrastructure of an enterprise (or the licensing relationship of one). The middle is empty, and it is not coming back.