cleandimslearnfoundations6. What is changing

What is changing.

Agents are starting to produce dimensional values at machine speed. The operating model that has barely coped with human-produced variance does not work when machines are the dominant producers.

PAGE6 of 7MODULEFoundationsREADING TIME~ 5 min

The previous pages described the dimensional data problem as it has existed for thirty years. This page is about why the problem is becoming more consequential now than it has been at any previous point. The short version: agents are starting to produce dimensional values at machine speed, and the operating model that has barely coped with human-produced variance does not work when machines are the dominant producers.

How dimensional values have been produced, until now.

For most of the history of business data systems, dimensional values were produced by humans. The sales rep entered the customer's industry. The procurement analyst entered the vendor's category. The support agent classified the ticket. The volume of new variants per dimension was bounded by how many humans were involved and how fast they could type.

This bound is what made reactive management workable. The cleanup that an analyst did this quarter could keep pace with the variance the operational systems produced last quarter, because both were happening at human speed. The dashboards were wrong, but the wrongness was bounded, and a quarterly remediation project could absorb it.

The feedback loop also depended on humans. When a dashboard did not make business sense, a human analyst noticed and investigated. The investigation traced the wrong number back to a misclassified record. The record got corrected. This is not formal governance, but it is a real form of error correction, and it was sufficient because the volume of records to correct was bounded by human-speed production.

How dimensional values are produced now.

A growing share of dimensional values is now produced by software acting on behalf of users. Sales agents categorise leads. Support agents classify tickets. Finance agents code expenses. Product agents tag features. Marketing agents assign campaigns. The same categorical fields that humans previously populated are now populated by a mix of humans and machines, with the machine share growing.

This shift changes the dimensional problem in three specific ways.

Volume. Agents produce dimensional values at machine speed. A field that previously accumulated a few dozen new values per week now accumulates thousands. Inconsistencies compound proportionally faster.

The character of the variance. Agents do not typo. They do not improvise abbreviations. They do not get tired on Friday afternoon. The surface variance that dominated human-entered data largely disappears in agent-entered data. But agents reproduce, faithfully and at scale, whatever convention they were configured with. If two agents are configured by two different teams with two different conventions, the variance between them becomes the new dominant variance. The error profile shifts from messy and visible to clean and structurally wrong.

The feedback loop. Human analysts noticed when something seemed off. Other agents do not, at least not in the same way. When agents produce most of the records, and other agents consume those records, the human in the middle of the loop is no longer present. Errors propagate without the friction that used to slow them.

Why this matters.

The reactive approach worked because it was a workaround. It depended on human-bounded volumes, visible errors, and slow accumulation. None of those conditions hold when agents become the dominant producers.

The implication is not that the dimensional data problem is new. The problem has existed for thirty years. What is new is that the operating model that has barely coped with the problem is reaching its limit. The same approach that has worked, in some sense, for thirty years will not work for the next ten.

This is the argument the manifesto develops in full, and the urgency that the target state document responds to. The target state describes what well-managed dimensional data looks like in a world where agents are significant producers: a canonical reference that is operationally load-bearing, agents reading from synchronised caches, confidence signals that reflect cache freshness, consumer-side policy that determines what to do with low-confidence output. The architecture is designed to work at machine speed, because the variance it has to manage is being produced at machine speed.

A practical implication.

For a learner working through this curriculum, the practical implication is that the right time to start thinking about dimension management is before agents are a significant producer in your organisation, not after. The cost of leaving dimensions unmanaged is currently increasing for most organisations, and the increase will continue as agents take on more of the work that humans currently do. The investment that would have been optional five years ago is becoming necessary now.

GOING DEEPER