cleandimslearnfoundations7. What good looks like

What good looks like.

The constructive close. The elements of well-managed dimensional data.

PAGE7 of 7MODULEFoundationsREADING TIME~ 5 min

The previous six pages described what dimensional data is, how it goes wrong, where it comes from, what it costs, why it stays unmanaged, and why the problem is becoming more consequential now. This page is the constructive close: what well-managed dimensional data looks like in practice.

The description is necessarily compressed. The full specification is in the target state document, which is one of the four foundational documents of CleanDims. This page introduces the architecture; the target state develops it in detail.

The five elements.

Well-managed dimensional data has five elements that work together. None of them works on its own; the combination is the discipline.

A canonical reference. For each dimension that matters, somewhere in the organisation there is a single source that says: these are the accepted values, this is what each one means, this is when each one was introduced, this is when each one was retired, and these are the alternative forms in which the same value has been observed. The reference is system-agnostic; it does not live inside any single warehouse or catalog or master data system; it is its own surface, exposed via API to any consumer that needs to read it.

Named ownership. Each dimension that matters has a named steward. The steward is one person, not a committee, holding decision rights for additions, definitional changes, deprecations, and contested decisions within their dimension. The steward is typically part-time, layered onto an existing role rather than constituted as a new full-time function, with tooling that lets routine work auto-resolve and only genuinely ambiguous cases reach them.

An intake process. New values do not appear in the canonical reference by accident. They are proposed (automatically by the system observing a new variant in a source, or manually by anyone in the organisation), reviewed, and accepted or rejected according to a documented process. The intake rate matches the rate of legitimate change in the underlying business, not the rate at which agents invent new strings.

Propagation. The canonical reference is readable by every system that produces or consumes the dimension. Operational systems validate against it. Pipelines map to it. Analytical systems use it as the join key. Agents read it through synchronised local caches. A reference that lives in a wiki and is not read by any system is decorative; the reference that works is the one that is operationally load-bearing.

Versioning. The canonical reference changes over time. Records written under version N must remain interpretable under version N+1. Historical analysis must be able to reconstruct what a value meant at the time it was written, not only what it means now. Without versioning, every meaningful change to the dimension breaks longitudinal analysis.

What it is not.

Well-managed dimensional data is not the same as cleaned data. A column in a warehouse that has been deduplicated and normalised once is clean for the moment of cleaning; the next time data arrives, the column degrades again. Well-managed dimensions are managed continuously, not cleaned periodically.

Well-managed dimensional data is not the same as managed metadata. A data catalog organises metadata about data assets and helps practitioners find what they need; it does not govern the categorical values themselves. A canonical reference is the source of truth that the catalog points at when it describes valid values for a dimensional field.

Well-managed dimensional data is not the same as mastered entities. Master Data Management resolves which record is the authoritative golden record for a customer, a product, or a supplier. The categorical attributes on the master record (the customer's industry, the product's category, the supplier's classification) are not governed by MDM; they come from the canonical reference for the relevant dimension. The two disciplines are complementary.

Where to go from here.

A learner who has worked through the seven foundations pages now has the conceptual framework needed to engage with the rest of the CleanDims material. The recommended path from here:

The manifesto for the position that dimensional data deserves to be treated as a category in its own right and that the cost of leaving it unmanaged is changing.

The primer for the observational treatment of dimensional data as a class with characteristic failure modes. The primer covers the same ground as the foundations module at greater depth.

The target state for the full specification of what well-managed dimensional data looks like in practice, including the operating model, the propagation architecture, and the maturity path organisations follow toward it.

The Dimensional Data Problem for the four-layer reference catalogue of what goes wrong, organised as a navigable taxonomy with deep-link anchors per sub-type.

Trace the Chain for the interactive diagnostic that walks from a recognisable symptom in your own data down to the underlying root cause.

The foundations module ends here. Future modules will cover specific dimensions in depth (vendor names, customer segments, product taxonomies), specific environments (CRM, ERP, ticketing platforms), and the practitioner skills involved in stewardship and operating the reference. Those are not yet built.