B2B Media Group — Three Years Building a Data Foundation
Analytics engineering across a multi-property B2B publishing company: audience data, advertiser reporting, semantic layer, and two M&A integrations.
- →Unified audience data across multiple editorial properties into a single, tested analytics layer
- →Migrated from fragmented legacy reporting to a dbt + BigQuery stack with documented lineage
- →Unblocked 10+ marketing team members from a single-analyst CSV request queue through self-service analytics
- →Identified $932/day in warehouse waste from over-aggregated dashboards and pre-aggregated the underlying models
- →Reduced sales data entry error rates (25-30% SKU errors) through validated input pipelines
- →Supported data due diligence and post-close integration for two sub-brand acquisitions
A B2B publishing company operating across multiple editorial properties in industrial and technology markets. When the engagement began, they had strong editorial instincts and real commercial momentum, but their analytics infrastructure reflected a pattern common to media companies that grew through content before growing through data: audience and advertiser information lived in platforms that had never been connected.
Starting point
The business ran on advertising revenue. Understanding audience engagement, advertiser performance, and content attribution was foundational to every commercial decision, but those answers were scattered across Google Analytics, a CRM, multiple ad serving platforms, and editorial tools with no common data layer underneath them.
Sales teams built their own Excel models for advertiser reporting, with SKU entry errors running at 25-30% because reps were manually creating non-existent product codes. Editorial teams had no reliable way to understand which content drove audience acquisition vs. retention. Finance worked off platform-reported numbers that didn’t reconcile with each other. More than ten people on the marketing team queued ad hoc CSV requests through a single analyst. Everyone was making decisions from partial data, and honestly, everyone knew it.
How we built it
The first phase was foundational: a single data warehouse in BigQuery, ETL pipelines pulling from core source systems, and dbt as the transformation layer. The emphasis was on documentation and trust as much as technical output. Every model was tested, every column was described, and every metric definition was agreed on before it appeared in a dashboard. Data that doesn’t get trusted doesn’t get used.
From that foundation, the work expanded in two directions. On the technical side: semantic layer design and evaluation (comparing multiple tools for fit against the team’s actual maturity and budget), cost optimization that identified $932 per day in warehouse waste from over-aggregated dashboards, and advertiser-facing dashboards surfacing campaign delivery and intent data. On the organizational side: working with editorial, sales, and ad ops leads to translate the warehouse into the questions they actually needed answered, in language that fit their workflow rather than the data model.
M&A integration work
During the engagement, the parent company acquired two specialty B2B media brands, each with its own source systems, audience definitions, and reporting history. Both required pre-close data due diligence (what are we inheriting, and how healthy is it?) and post-close integration work: mapping source systems, reconciling audience definitions across properties, and extending the analytics infrastructure to cover the acquired brands without breaking the reporting the core business relied on.
M&A data work is a specific discipline. The timelines are compressed, the stakes around what you find (or miss) are real, and the integration work that follows has to happen without disrupting a live business. Running that process twice, in parallel with ongoing analytics engineering work, was among the more demanding parts of the engagement.
What changed
By the end of the engagement, the analytics function looked fundamentally different from where it started. Editorial teams could answer audience questions without submitting a request to data. The marketing team went from queuing through a single analyst to pulling their own segments, and throughput went up across the board. Sales could produce accurate advertiser performance reports without manual reconciliation, and SKU entry errors dropped significantly once the validated input pipelines were in place. Finance worked from a single source of numbers. The stack was documented, tested, and understood by the people who used it, not dependent on a single outside consultant to interpret.
I think the thing that made the biggest difference over three years was the steady, unglamorous effort to make the data trustworthy enough that people would actually change how they made decisions. The technology enabled that, but the trust is what made it stick.