cleandimslearnorigins7. Why existing tools fall short

Why existing tools fall short

Four adjacent tool categories touch the categorical surface. None treats it as the primary object.

PAGE7 of 8MODULEOriginsREADING TIME~ 5 min

The gap each tool leaves

Adjacent tool categories touch the categorical surface. None of them treats it as the primary object. Each was built for a related problem and is being asked, by organisations noticing the dimensional gap for the first time, to extend into a problem it was not built for. The extension is possible in principle and structurally constrained in practice.

Data observability

Monte Carlo, Soda, Bigeye, Anomalo. Detect volume, freshness, and distribution drift. Cover structural and statistical quality control well.

They do not govern categorical semantics or canonical values, because the questions they answer are statistical and the questions dimensions ask are semantic. A monitor can tell you that a new value appeared in the vendor field. It cannot tell you what that value should resolve to.

Master data management

Informatica, Reltio, Profisee. Resolve whether two records refer to the same entity. This is genuinely valuable work. It is also a different problem from governing the categorical attributes on those entities once resolved.

MDM answers are these the same customer; dimension management answers what segment is this customer in, and what does that segment mean. The attributes on master records get populated by survivorship rules, which have no view on whether the surviving value is canonical.

Data catalogues

Atlan, Collibra, Alation. Document what data assets exist, what they mean, how they connect. The catalogue is the reference for what is in the warehouse. It documents the policy that governs categorical values; it does not enforce that policy at runtime.

A catalogue entry saying the vendor field uses a specific reference is decorative if the operational systems and agents are not actually reading from that reference at the moment they produce or consume values.

Semantic layers

Cube, MetricFlow, the dbt Semantic Layer. Define metrics and joins above the warehouse so every consumer reads the same definition of revenue. They consume dimension values; they do not curate them.

The semantic layer is downstream of the dimensional layer. It depends on the dimensions being correct; it does not make them correct.

Two pressures at once

Each of these tools touches the categorical surface in the course of doing its primary job. None of them was designed to own it. And each is, at the same moment, undergoing its own transition toward the agent era: observability tools rebuilding for autonomous remediation, MDM platforms figuring out agent-driven matching, catalogues becoming semantic surfaces for LLMs to read.

Asking any one of them to also solve dimensional governance is asking it to solve a problem it was not built for, at a scale it was not designed for, while undergoing its own architectural transformation. The next page explains why a horizontal solution, sitting above all four categories, is the only model that works.

GOING DEEPER