Like every other aspect of digital management, data management continues to evolve. Early pre-relational mainframe systems, such as IMS, VSAM and IDMS, placed little emphasis on data quality. These hierarchical, key-indexed and linked-list databases are very prone to application failures due to unexpected data.
Relational databases do a great job of type-checking fields and provide such constructs as views to restrict data ranges. Data warehouses and data marts have introduced the notion of Extract, Transform and Load (ETL) processes to cleanse data providing basic Data Quality Management (DQM).
In an IT Service Management context, the Configuration Management Database (CMDB) provides the system of record, which is usually a relational database. A CMDB that relies only on an IT asset discovery source is often incomplete. Blazent uses a sophisticated ETL process to provide DQM for the CMDB, filling gaps, cross-checking data across multiple sources and validating relationships.
A new form of raw data known as a data lake is useful to organizations that collect log data, knowing it contains potentially valuable information, but don’t want to pre-define all of its uses. In this instance, they develop extraction algorithms that suit specific use cases as needed.
Blazent’s white paper linked here introduces the data lake and its application. The data lake is described within the context of various data management concepts in the following order:
- Systems of record
- Data warehouses
- Operational data stores
- Data marts
- Data lakes
As more data source type continue to emerge, data quality becomes the core enabler of competitive value-add across the organization. To learn more about Data Quality Management and how critical it is to your organization, we invite you to peruse our resource library here.