What 950 Topics Across 80 Domains Taught Me About Content Architecture

I’ve written about helping my B2B media project set up cloud analytics infrastructure. But one angle of the work I haven’t detailed yet: treating content as product. If you’re treating content as product then you’re considering content as data. Email marketing is the one area where content teams already think this way, measuring opens, clicks, and engagement at the campaign level. But the broader content library, the articles, reports, courses, and everything published across a company’s web properties, rarely gets that same treatment.

At this company, we had 80+ editorial domains and a content library spanning three acquired organizations. Over time it had grown to 950+ topics with no structural logic connecting them across domains, no performance data at the topic level, and no way for the editorial team to understand what was resonating with their audience versus what was sitting untouched.

Editorial and content producers have to keep the process turning. They need to generate volume. But there’s a real benefit in building organization and measurement behind that volume creation, so the effort going into new content is informed by what’s already working.

Why content architecture is a data architecture problem

Content teams are usually good at creating. What they’re usually missing is the infrastructure to know whether their existing library is compounding, because no one focuses on the specifics of the measurement layer.

At this company, the CMS had every topic in a flat list. No hierarchy, no relationship to the domains that published under those topics, no performance data attached. When the marketing team needed to run campaigns targeting specific content affinities, they were working from general assumptions and a spreadsheet.

I think of this the same way I think about data warehouses. When you walk into a messy data environment, you don’t start by building new pipelines. You start by understanding what’s already there, how it’s structured (or not), and where the gaps are. Content libraries are the same problem. My own progression from data warehouse architecture to content architecture felt natural for that reason. The skills transfer directly: schema design, quality management, performance measurement, serving the right information to the right people.

What the restructuring looked like

I restructured the topic taxonomy into collections organized by domain and content category, so each domain’s topics were grouped by the content architecture that readers and search engines were seeing. Then I built automated reporting on topic-level impressions and click-through rates. This was a starting point. The longer-term goal was connecting content performance to personalized user journeys and audience behavior across the full funnel.

The content and marketing teams could now see which topic clusters had strong engagement and which needed attention. Monthly pruning reviews became possible. We found topics with high impressions but low CTR that needed title and metadata work, topics with low traffic that could be consolidated or retired, and content gaps where audience demand existed but no content had been built.

The reporting also connected to the audience data platform. I bridged the editorial CMS taxonomy with the marketing audience taxonomy, which meant the organization could see which content topics were driving buyer signals for their 800+ advertisers. This approach improved things for editors making content decisions, but it also helped sales reps understand the products they were selling better, because the content inventory was now measurable and tied to audience behavior.

Fine-tuning at the content level

A more specific example of further fine-tuning was the company’s modal campaign program. They ran about 20 affinity-based engagement campaigns per month on published editorial content, generating $50K+ in monthly revenue. Before there was a measurement layer, nobody could connect content topics to campaign performance.

I built automated reporting comparing campaign performance across topics with conversion and reach metrics, replacing a manual Google Sheets process. Then I created a monthly CMS-to-audience-platform overlap analysis that identified where new editorial topics were being published but hadn’t been added to the catalogue of content topics that were sold to advertisers. That gap meant the editorial team was creating content that the marketing team couldn’t sell against because the systems weren’t connected.

Content libraries are data products

Content libraries have schemas (taxonomy), quality issues (duplicate and stale entries), performance metrics (impressions, CTR, engagement trends), and users (both the audience and the internal teams that depend on the content). Treating them with the same rigor you’d bring to a data warehouse makes the difference between a library that compounds in value and one that stagnates.

Building the measurement infrastructure is what makes it possible to move beyond volume creation toward optimization. Most content teams I’ve worked with want to do this work. What they’re missing is the system.