From warehouse to lakehouse to CDP, each generation promised to deliver it.
Each arrived to find the truth had moved.
Open any large enterprise's customer data platform. It contains tens of millions of profiles, hundreds of segments, years of identity work. Then ask a quieter question than how complete the record is. How much is current? How much reflects the customer who logged in this morning, the support call placed an hour ago, the channel that did not exist when the project began? The CDP became a respectable place to keep the data. It almost never became the single source of truth its category promised. The truth kept moving.
The Customer Data Platform was named in April 2013, in a blog post by the analyst David Raab. By 2018 the category had its own Magic Quadrant, its own Forrester Wave, and a pitch that landed cleanly in every boardroom. Customer data was scattered across the warehouse, the email tool, the analytics suite and the ad platform. Stitching it together was painful, slow, and could consume entire teams of analysts for years without ever delivering a record the business trusted. A single platform, marketer-controlled, would become the golden record at last: the unified, trusted view every other system could draw on. The category answered a real problem. That is why it sold.
A decade later the category is somewhere between $2.4 billion (the CDP Institute's own estimate) and roughly $7 billion (most research firms), depending on who is counting and what they choose to count. Adoption is no longer the story. According to Gartner's 2023 marketing technology survey, around two-thirds of marketing organisations have a CDP in place. Of those that do, the average buyer uses only 47 per cent of the capabilities available, down from 55 per cent a year earlier. In 2024 Gartner placed the category in what its hype-cycle vocabulary calls the Trough of Disillusionment. Forrester's lead analyst called the same year the category's "make-or-break" moment.
The shortfall is, by now, remarkably consistent. Even where the CDP is well-implemented and well-funded, the record is always slightly behind: new channels appear faster than they can be modelled, new sources faster than they can be reconciled, and the data itself has shifted from transactional and at rest to streaming and live. The record is built. It is just rarely current enough to be the truth at the customer's next move. The platform sold as the source of truth became, in production, a more polished warehouse.
This is not the first time the customer was promised a unified record by a layer that turned out to be one more place where the data sits. Before the CDP came the database, the warehouse, the mart, the data lake and the lakehouse. Each was sold, in its decade, as the architecture that would finally unify customer data. Each became another carefully designed store. The truth kept moving. The channels kept multiplying. The record kept arriving slightly out of date. The CDP did not invent this pattern. It is the latest expression of it. The next one is already being pitched.
It is tempting to blame implementation. The consistency of the gap across vendors, industries and geographies points to something deeper. The CDP fell short of the golden-record promise because of three architectural choices, all rational at the time, that made the record always slightly out of date by the moment it mattered.
The first was batch ingestion. The category grew up as a marketer-friendly evolution of the data warehouse, and warehouse thinking is batch thinking. Streaming was bolted on later, never end to end. The second was segment-first design. Everything in a CDP is organised around the segment, a slow object built from a slowly refreshed record. By the time a customer enters one, the moment that put them there has often moved on. The third was the record as the unit of output. The CDP's natural verb is "hand the record to the next tool." None of these choices were foolish in 2013. The shape of the problem changed underneath them.
The record is built. By the time it settles, the customer has moved on.
The vendors know this. The category's response has been a procession of reframings, each addressing one of the problems the moving target creates. The real-time CDP, to address currency. The composable CDP, to absorb new sources faster. The activation CDP, to close the gap between the record and the moment. Most recently, the agentic CDP, to put an agent in front of the record. Each label leaves the underlying architecture almost unchanged. Reframing does not change what a thing is. A platform built around batch ingestion, segment-first design and the record as the unit of output is a storage layer, regardless of the label. The record is still being built. The customer's moment is still passing through somewhere else.
This is not the same architecture at higher speed. A real-time CDP is a record-first architecture, faster. The architecture this piece points at is signal-first: the action lives where the signal lands, the record updates after the act, the audit trail is produced in the same motion. Two architectures, not two speeds of the same one.
The interesting question is no longer whether to have both storage and decisioning, but whether they should sit on separate platforms with an export between them, or on the same substrate with no boundary to cross.
| Sold as | Became |
|---|---|
| The golden record. A single, unified, trusted view of the customer. | One more record, alongside the warehouse, the data lake and the operational stores, none quite reconciled with the others. |
| A view of the customer that would be durable across channels. | A view that was durable until the next channel appeared, the next source was added, the next interaction shifted from rest to live. |
| The end of the analytics-to-activation gap. | A better way to keep and segment the data. The gap to the customer's moment moved one step deeper into the architecture. |
One question cuts through every relabelling the category will produce in the next five years. Ask it of any platform that calls itself a customer data architecture. What does your platform do when a fresh signal arrives that is not yet in the record? If the answer involves ingesting, reconciling and exporting before anything else can happen, you are looking at a sophisticated version of every previous generation. If the answer is that the platform acts on the signal as it arrives, against the state it has, with the audit produced in the same motion as the act, you are looking at something different. The category label has stopped being a useful guide. The architecture has not. Three decades into the chase, the buyer who keeps asking the question is the one who stops paying for the same architecture under a new name.
The architecture that survives this redrafting has four properties the relabelling does not. An open architecture, so the operator can read and write on terms they publish rather than terms a vendor locks in. Reason-coded decision logs as a by-product of every decision, exportable on the regulator's day in the regulator's format. Outcomes-aligned pricing, so the commercials match the value created rather than the seats sold. Operator-controlled infrastructure, so neither the customer's data nor the decision's audit trail crosses a vendor's boundary.
Whether the architecture needs to centralise the data at all is the next question worth asking. Every generation in the lineage above took centralisation as the price of usefulness. In an era of agents that navigate distributed sources, that price may no longer be one worth paying.