Step up data quality with augmented data management
4 MINS READ
Highlights
The data quality conundrum
Every year, poor data quality costs organizations around $12.9 million according to Gartner.
In 1983, an Edmonton-bound Air Canada flight had to make an emergency landing in Manitoba, Canada. The reason? A metric conversion mix-up had led the ground crew to load only half the fuel required to reach Edmonton. In another incident from 1999, NASA lost its $125 million Mars climate orbiter due to an erroneous English to metric conversion while exchanging vital data before the craft was launched.
Both these incidents point to the need for data standardization and high-quality data. Unfortunately, the outlook around this is still lacking. Many organizations are losing millions of dollars every year due to poor data quality.
Stumbling blocks
From duplicate to inaccurate and inconsistent data, the quality issues around data are manifold.
While there are solutions available for addressing data quality, they are far from ideal as they mostly take a reactive approach. Moreover, these solutions predominantly rely on available metadata and rules. It’s no surprise then that companies are still reeling from the impact of poor data quality, such as:
Revenue loss: According to the Gartner report mentioned above, every year, poor data quality costs organizations an average of $12.9 million.
Missed opportunities: 21 cents for every dollar spent on media is wasted due to poor data quality, based on a Forrester report.
Inaccurate decisions: 76% of IT decision makers believe that revenue opportunities have been missed due to lack of accurate data insights. Furthermore, 77% of IT decision makers do not trust the data they are basing their decisions on. Both these insights are gleaned from Snaplogic.
Augment data quality
Improve the quality of enterprise data with AI/ML by augmenting the people, processes, and technology around data quality management.
Start by identifying data quality issues to address and take corrective measures. It’s best to address these issues as close to the data’s origin as possible. Although this seems like a simple, straightforward ask, it’s often not the case as explained below.
There can be several sources for metadata and data quality rules in an organization. However, the main challenge in ensuring quality lies in gathering only the most relevant and accurate sources. And business subject matter experts are the most reliable people to gather this data from. It is a complex undertaking that requires defining and implementing a framework supported by people, processes, and technology. It needs niche skills and know-how around data quality and metadata management—a task often confined to teams that are strictly IT functions.
This slows down the implementation of data quality strategies. It is hence important to democratize data quality management. Build solutions that allow all the relevant users to contribute to tackling existing data challenges and creating high quality data. Companies can leverage ML techniques for data quality management.
Data quality rules
Pivot from a top-down to a bottom-up approach to define data quality rules.
It’s time to rethink the way data quality rules are defined. In the traditional top-down approach, subject matter experts define the rules based on their domain and business know-how, making it a time-consuming process. Instead, companies can adopt a bottom-up approach where data quality rules are generated using AI/ML to automatically discover metadata and relationships from the existing data. The result? Improved turnaround time for creating data quality rules, which can then be used to create a rules repository that’s accessible to the wider organization.
With open data quality rules, a user will be able to include them as part of the application source code, data pipelines, and data quality management solutions. What’s more, this will enable data scientists to focus on building more accurate ML models without worrying about data quality issues. Above all, this approach will ensure enterprises take accurate data-driven decisions that yield expected results.