cleandimslearnorigins6. Why dimensions specifically need humans

Why dimensions specifically need humans

Of the five forms variance takes, only one is safely automatable. The other four require organisational decisions, not algorithmic ones.

PAGE6 of 8MODULEOriginsREADING TIME~ 5 min

Semantic, not statistical

Dimensional problems are semantic, not statistical. The Foundations module introduced the five forms variance takes. The argument here is that four of the five cannot be resolved without a human decision, and that this is the structural reason dimension management is a discipline rather than a tool.

Walking the five forms

  • Surface variance (automatable).The same concept recorded with different formatting: AWS versus aws versus A.W.S. An algorithm can propose a canonical form and a steward can approve the proposal once. The resolution propagates. This is the only form of variance that is safely automatable, and it is the smallest part of the actual problem.
  • Semantic variance (human required).Vendor, Supplier, Partner. Laptop, Notebook. The choice of which string becomes canonical is a naming decision. The team that uses Vendor and the team that uses Supplier both have reasons. Reconciling them ends in a meeting, not an algorithm.
  • Definitional variance (human required).Mid-Market means companies between $1M and $10M in annual revenue to the finance team, companies with 100 to 999 employees to the sales team, and self-reported on a marketing form to the demand-generation team. No algorithm can resolve this. The resolution is a choice about what the word means.
  • Granularity variance (human required).Financial Services in the CRM, Retail Banking in the warehouse, BFSI in an external report. Each value is correct at its level. The choice of which level is canonical, and how the others roll into it, is a hierarchy decision the organisation has to make explicitly.
  • Temporal variance (human required).A product is renamed. The decision of whether to remap history, cut over at a date, or maintain both labels indefinitely is a policy decision with downstream consequences. No algorithm has the authority to make it.

Detect at machine speed, decide at human speed

The pattern across the four human-required forms is the same. An agent can surface discrepancies at machine speed. It cannot decide what Mid-Market means. That is an organisational decision, not a data decision.

The architecture that follows from this constraint is human-in-the-loop by design. Agents detect, flag, and queue. Humans decide, approve, and certify. The decisions propagate back through the same infrastructure that surfaced them. The full specification is in the target state.

GOING DEEPER