cleandimslearnorigins5. Quality control moves upstream

Quality control moves upstream

The check that used to happen at the visualisation layer does not consolidate into one new place. It distributes.

PAGE5 of 8MODULEOriginsREADING TIME~ 5 min

Distribution, not consolidation

The check that used to happen at the visualisation layer does not consolidate into one new place. It distributes across the pipeline. Each class of problem is handled at the layer where it can be handled most cleanly. Two of the resulting layers can be automated. Two require a human in the loop, because the decisions are irreducibly organisational rather than technical.

The four layers

  • Source contracts (automated).Schema, type, nullability, freshness. Enforced automatically at ingestion against a declared contract. Great Expectations, Soda, dbt tests. The questions have one right answer. A null in a non-nullable column is a null in a non-nullable column.
  • Ingestion observability (automated).Outliers, distribution drift, volume and freshness anomalies. Caught by monitors. Monte Carlo, Bigeye, Anomalo. The detection is automated. The judgment of whether a change is a problem is human, but the surfacing of the change does not require human attention.
  • Dimension management (human in the loop).Variant resolution, canonical definitions, naming and hierarchy decisions on categorical values. The detection of inconsistency can be automated. The resolution cannot, because the resolution is a decision about how the business wants to describe itself. This is the layer this module is about.
  • Metric governance (human in the loop).KPI definitions, slowly-changing-dimension policy, hierarchy versioning. Authored by humans, enforced by the agents and pipelines that read from the semantic layer. Cube, MetricFlow, the dbt Semantic Layer. The semantic layer sits between the warehouse and the consumers and answers what does revenue mean here.

Why this split matters

Layers one and two catch problems that have one right answer. Layers three and four carry decisions that no algorithm can make alone. They depend on a named owner with authority over the dimension or the metric.

The mistake an organisation makes, when it first notices the safety net has gone, is to treat all four layers as variations of the same problem and reach for the observability category as the answer. Observability tools cover layers one and two well. They cannot cover layers three and four, because the work at those layers is not detection. It is decision. The next page explains why dimensional decisions specifically require humans.

GOING DEEPER