Top 5 Causes of poor data quality

451 Research recently released a report titled, “The State of Enterprise Data Quality: 2016, The Role of DQM in Machine Learning and Predictive Analysis.” Its authors are Carl Lehmann, Krishna Roy and Bob Winter. A key finding from their survey of hundreds of IT Professionals were the leading causes of poor data quality, which are listed in the chart below:

Data entry by employees: Not surprisingly, the leading cause (by a large margin) is human error. As distractions grow and attention spans shrink, it is highly unlikely data in the future will be entered more accurately. The good news is the ongoing Internet-of-Things (IoT) revolution promises to reduce the reliance on humans, as smarter machines and sensors collect data at the source and relay it directly to corporate IT systems.

Data migration and conversion projects: Migrating data to new systems carries an inherent risk to data quality. Bad data is everywhere. Data values can be irregular, missing or misplaced. Even in spreadsheets, cell-level validation is not rigorous enough to prevent surprises. Database systems often rely on humans to code referential integrity checks to stop unexpected outliers from sneaking into these systems.

Mixed entries by multiple users: Instructions on how data should be entered is open to interpretation, so different people sometimes incorrectly populate a field without even realizing it. An example is an IT Asset Management setting entering the naming of a server manufacturer differently. Multiple asset managers might enter “Dell Inc.,” “Dell” or “Dell Computer” for the same vendor. Data Quality Management tools should be used to normalize such variances.

Changes to source systems: Application users are responsible for the consistency of the data they enter and maintain across application and configuration changes. Quality Assurance departments and application programmers tend to focus on functional changes rather than impacts to the data that an application uses. Often, only the application development team understands why data is managed in a particular way. As organizations make increasing use of third parties to manage in-house developed applications, this associated knowledge of the semantics of the application aren’t communicated to the third -party. As third parties make changes to how an application functions, they can inadvertently harm the integrity of the data it uses.

Systems Errors: During the old days, application programs ran on a single computer. Today’s applications, by contrast, interact with many computers, making them far more complex. If these systems don’t have sufficient redundancy built-in, then they can also be very fragile and prone to failure. Traditional systems were less likely to corrupt data as they were simpler. As applications grow in complexity and are distributed across more computers, data corruptions have become more common and harder to isolate and fix.

Due to the prevalence of dirty data, Extraction-Transformation-Load (ETL) and Data Quality management (DQM) technologies have emerged to tackle this scourge. Blazent provides very sophisticated algorithms to reconcile data from multiple sources to create validated and verified data records. You can learn more about Blazent and download the 451 Research report from here.

Recent Posts

Categories