The canonical reference.
One authoritative source for every dimension that matters. Federated across the organisation, exposed via API, designed to be operationally load-bearing.
What it is.
The canonical reference is the authoritative record of accepted values for every dimension you choose to manage. For each value, the reference holds: the canonical form itself, a definition in plain language, a named owner with decision rights, a version, introduction and retirement dates, the aliases under which the same value has been observed in source systems, a parent reference where the dimension is hierarchical, and a complete change log.
This is not a catalog, which describes data assets and points at where they live. CleanDims is the source the catalog points at when it describes what values a dimensional field is allowed to hold.
Federated, not centralised.
A single organisation-wide list of every value across every dimension is the wrong unit of governance. Different dimensions belong in different domains: vendor names sit with procurement, product categories sit with product, customer segments sit with sales or revenue operations, employee roles sit with people. CleanDims allows multiple references to coexist, each scoped to the domain that has the context to maintain it.
A directory of dimensions, lightweight and organisation-wide, records which reference owns which dimension. The directory carries no values and no definitions; it is purely an index from dimension name to authoritative reference. Two federated nodes cannot both claim ownership of the same dimension; the directory enforces the boundary.
Versioning and aliases.
Every change to the reference produces a new version, recorded in the change log. Consumers pin to versions so that historical analysis remains interpretable: a record written under version N can still be resolved under version N+1, with the mapping captured explicitly rather than inferred.
Aliases are stored within the reference itself, not as a separately maintained mapping. When a consumer asks what the canonical value is for a given string, the response contains both the canonical and the path by which the alias was resolved. This eliminates a class of failure where the canonical set and the alias set drift out of agreement.
A multinational with five labels for one industry.
A multinational records customers' industries across the CRM, the billing system, the support tool, and an external data provider. Five labels emerge for the same underlying concept: “Financial Services,” “Banking & Finance,” “BFSI,” “Banks and Insurance,” and the external provider's three-character SIC code. CleanDims holds one canonical entry for the concept (the choice of which is the steward's call, recorded in the change log with reasoning), maps the five observed forms as aliases, and exposes a single API that any consumer can query to resolve an incoming value to its canonical. The CRM keeps using its label; the warehouse query joining customer data to support tickets resolves both sides through the reference; the analyst no longer maintains a personal CASE WHEN statement.