Why Your AI Pilot Stalls at the Data Foundation

Last year on a client project I was helping build a semantic search tool on top of their audience database. Marketing wanted to pull targeted segments in plain language instead of waiting for an analyst to write a query. We were putting vector embeddings over job title and company fields and wiring the search to run on top of the results. The embedding model and the prompt engineering felt like the obvious focus.

What we discovered pretty quickly was that the underlying data had more to say about our results than the model did. The retrieval pipeline was doing its job, but the inputs it was working with had foundational gaps that were holding the outcome back.

In the process of doing exploratory work on the data sources before embedding them, we uncovered patterns that had been compounding for a while. The web forms feeding first-party data into the system had been set up without consistent administration across the fields that mattered most. Critical fields like job title and company name were open free-text entries, so visitors could enter whatever they wanted. Over time, that produced blank fields, garbled text, free-text responses stored in the wrong columns, and stale values from abandoned contact records re-ingested under new IDs. The inconsistency traced back to multiple upstream sources, each with a slightly different idea of which field was canonical when the same person filled out a form twice. Without that kind of foundational discipline across the company, data quality stays a moving target for every team building on top of it.

The model was doing exactly what it should. Good inputs, good outputs. Inconsistent inputs, inconsistent outputs. The data just hadn’t been governed.

We spent weeks on what I’d have called boring work a few years ago. We mapped each source to an authoritative list of values for each field and worked on auto-filling upstream forms with known first-party data for users we recognized across sessions. We built views on top of the raw layer so the semantic search was working with records we actually trusted. After that, we collapsed the time from segment request to client or prospect reply from four to five business days down to one. That’s the kind of outcome I trust on a tool like this.

When I think about where AI is landing inside companies right now, that project is the one I keep coming back to.

Where the foundation conversation starts

As a consultant you kind of have a distant window into patterns across companies. You’re not really a part of any one team, but you can see how the same things play out. Across projects and conversations, some version of this question keeps coming up: “We’ve been trying to get AI into our analytics workflow for the last year and a half, and we’re still not there. What are we missing?”

What tends to be missing is the foundation. The team I opened with had that exact experience. The search tool was waiting on a cleanup job in the raw layer that no one had scoped, because no one had seen the need. That gap, between what the data is supposed to mean and what the downstream tools can safely assume about it, is the foundation gap. It is wider than teams expect, and closing it is the work that often gets deferred on the way to the AI conversation.

Most of the time the symptom is less obvious than corrupted fields: a pilot that works in the demo environment and quietly underperforms in production, a dashboard everyone references that nobody trusts when the numbers are close to a decision threshold, a model that keeps flagging the wrong customers as at-risk and nobody can explain why.

Each of those leads to the same place. Someone has to sit down and do the foundational work that connects what the data says to what the tools need it to mean.

The time I called it wrong

I don’t always walk into a project and see the foundation gap on day one. A couple of years ago I was running a Shopify data integration project for a client with multiple brands and distributed stakeholders. I scoped it as an execution problem. I thought the hard part was going to be the pipeline and the modeling, and I built a delivery plan that assumed the team was already aligned on what “done” meant.

We weren’t. Midway through the build I started hitting requirements I didn’t know about, from stakeholders I hadn’t included in the original discovery, and the work stopped being “finish the pipeline” and started being “rescope the project from the middle.” My accountability partner on the engagement pulled me aside and said we needed upfront planning sessions, design reviews, and documentation handoffs, not more developer hours. He was right. The foundation I was missing lived in the definition of the project, not the data. I’d diagnosed it as bandwidth when it was alignment.

Slow down at the start, map the people and the definitions before you commit to a build plan, and assume the hardest part of the work is agreement, not engineering.

That discipline is the same one that helps me catch the foundation gap in client data now.

The pyramid is old, the stakes are new

The hierarchy I’m describing is not new. What’s changed is who sits at the top.

A human analyst can compensate for a messy foundation. They interpret around duplicate records, know the controller’s spreadsheet exists, and ask a follow-up question when the data looks off. For decades, the analyst quietly solved data quality on behalf of the organization, and the cost of the mess got absorbed at the last mile.

An AI system does not compensate that way. It takes what it is given. It produces confident outputs that are wrong in ways you can’t detect, because the wrongness is baked into the inputs. When you replace the analyst with a model, you lose the human judgment that quietly absorbed the mess, and every foundation crack shows up as a model failure.

The pyramid is old. The thing that changed is that AI at the top surfaces every problem at the bottom, whether you were tracking them or not.

What “good enough” means when a model is the consumer

There are three specific conditions that separate data good enough for a human-read dashboard from data good enough for an AI system.

Identity consistency. The model needs to track an entity over time and across sources. That means one canonical representation of a customer, consistently applied, so the pipeline isn’t stitching the same person together over and over and getting it slightly wrong each time. Without that, the model reasons about fragments, and the downstream outputs are only as coherent as the worst join upstream.

Semantic consistency. The terms the model reasons about need to mean one thing. The form-field governance issue I described earlier is one version of this problem. Another is when the same metric gets pulled from two different sources and nobody has documented which one is canonical. When no one picks a source of truth, every downstream consumer inherits the disagreement. The fix is always the same: document field ownership, set a canonical source per metric, and build fallback logic for the gaps.

Temporal reliability. AI systems making predictions need data captured reliably over time. Gaps, retroactive edits to historical records, inconsistent event timing. All of it gets amplified when the model is asked to detect trends or predict behavior. Humans eyeball a gap and work around it. Models do not.

These are the same data quality fundamentals good analytics infrastructure has always required. A human analyst could work around a lot of it. A model cannot.

Why smart teams keep deferring this

What I find worth thinking about is why teams who understand the problem still defer the foundation work. The data leaders I’ve worked alongside often understood the gap in the abstract. They still shipped AI pilots ahead of it. A few patterns worth naming.

Boards fund what they can see. An AI agent you can demo in twenty seconds gets budgeted. Three months of entity resolution work does not. One VP of Data I worked with put it plainly: “If I go to the board and ask for six figures to clean up Salesforce, I get cut. If I go and ask for the same budget for an AI agent, I get a round of applause.” The incentive structure rewards the top of the stack and taxes the bottom.

Demos hide what the foundation is not doing. A pilot running against a cherry-picked dataset will always look good. The failure happens in production, after the budget is committed and the narrative is locked in. By the time the gap shows up, the political cost of saying “we need to pause and clean up” is higher than the cost of limping forward.

Data engineers are not good at selling foundational work. I say this with full complicity. The instinct is to describe the work in terms the business doesn’t connect to outcomes. The same body of work can sound like overhead or like triage depending on how you frame it, and most of us default to the framing that makes it sound optional.

Vendors make it easy to believe the tool will save you. Every CDP deck promises unified customer records. Every LLM vendor promises zero-shot magic. The vendor narrative is that the tool closes the foundation gap for you. In my experience, the tool almost always widens it, because it creates the illusion of a foundation without the underlying work.

And the one I find most interesting: the foundation work does not have a clean owner. It sits between the data engineer who owns the pipeline, the analyst who owns the metric definition, the product manager who owns the source system, and the executive who owns the budget. When no one owns the gap, no one fills it.

So who should? The answer depends on the shape of the company. At a Series A or B with a small data function and no formal platform team, my bet is the head of analytics engineering, because they’re the only person in the room whose job already spans the pipeline and the modeling layer, and they have enough proximity to the business to make the definition calls. At a later-stage company with a platform team, I’d push it toward a head of data platform, because by that point the ownership conversation is about infrastructure, governance, and the contracts between upstream source teams and downstream consumers. The part I feel strongest about: I don’t think it belongs with a data governance committee that meets monthly. Foundations degrade continuously and the owner has to be close enough to the work to notice when they do.

The foundation gap is really an ownership problem. It looks like a data problem because the tooling is where the pain surfaces first, but the fix lives with whoever is accountable for holding the definitions together.

Where to start

The discipline transfers even when the specifics don’t. Audit what the AI system will touch before you build it. Identify the entities that need resolution and resolve them. Document the metric definitions before a model reasons about them. Stand up data quality monitoring that catches degradation before the model does.

If you’ve got a pilot stuck in proof of concept, here’s the exercise I’d start with. Pick the single output your stakeholders care about most. Trace it back through your sources to where the data originates. Run good exploratory data analysis so you understand where things sit and where you’re starting from. You’ll usually find the gap is further upstream than you expected, and the real work is a conversation with the person who owns that source.

Then have that conversation. Right? That’s almost always where AI starts to ship.