How to Select the Best Technology for Data Quality Management

Outdated data quality software can put you at a competitive disadvantage and drive up organizational costs.

Modern technology is proactive, avoids more costs, and mitigates more risk.

In the following post, we will outline what to look for when selecting data quality technology as well as how to leverage artificial intelligence.

 

Proactive vs. Reactive

Reactive data quality tools attempt to address errors after they are persisted to a data store. During transfer from an initial data store to a data lake or warehouse, data quality tools identify errors and attempt to resolve them to maintain a cleansed destination data store. This transfer may occur days or months after the data was originally created. Due to this lead time, the user is unlikely to recall details of a single record out of the thousands entered that month.

As a result, these errors may be handled with an elaborate remediation process that is part of a larger data governance program and council. The remediation workflow for a single error can involve technical support representatives, subject matter experts, data stewards, and data engineers. In a typical scenario, a support rep will document a problem then data stewards and engineers will investigate the cause. When the cause is identified, the data steward will discuss the preferred solution with subject matter experts of the data. The fix must be documented by the steward, presented for approval by the data governance council and then implemented in a data quality rule by a data engineer. The estimated cost of remediation for a single new error is $10,000. After this investment, the rule will provide automated quality enforcement for each recurrence of the same error.

Due to the costliness of reactively remediating errors and the risk of accidentally using bad data that was saved, a proactive solution is preferred. Proactive solutions prompt the creator of the data to fix the error at the time of entry. The cost to resolve an error at the time of entry, known as prevention cost, is estimated to be $1.[1] When the error is resolved by the creator and at the time of entry, the best resolution can be provided at the lowest cost. The user entering the data is not given the time or chance to forget the context of the entry. Poor data introduced by IoT devices are immediately identified and quarantined. A real-time approach at all points of data entry can avoid first time exposure.

[1] Labovitz, G., Chang, Y.S., and Rosansky, V., 1992. Making Quality Work: A Leadership Guide for the Results-Driven Manager. John Wiley &Sons, Hoboken, NJ.

REACTIVE PROACTIVE
Incur risks and costs of first-time error exposure Avoid first-time error exposure
$10,000 remediation cost $1 remediation cost
Lengthy Solution Immediate resolution
Delayed identification and remediation cause subpar solution due to limited information availability. Best case solution could be deleting an entire row of data Best resolution possible because the data creator is providing the fix at the time of entry

Putting Artificial Intelligence to work

Traditional data quality tools require rules to be created for each error that your enterprise has experienced or anticipates. Leveraging artificial intelligence and deep learning enables protection against errors you cannot predict. Preventing first time exposures to errors can save $10,000 or more per instance in remediation costs as well as preventing risk and much larger costs from decisions based on poor data. Unlike traditional data quality tools that require updates to rules when data requirements and validation changes, AI technologies can adapt to changes by learning from data and responses from users. This avoids the cost of maintaining a large set of data quality rules.

    Share

           

    Categories

    Related Posts