DQART for Data Quality Management to Improve Profit

Dinesh Mohata

Retail & CPG Segment Head, Middle East & Africa

Monodip Chakravarty

Retail & CPG Transformation Lead | Saudi Arabia & Bahrain

Agniv Chakraborty

Data Quality Analyst

Industry

Retail

Highlights

Data drives an organization's growth and transformation strategy and is a catalyst for the enterprise to become future ready.
The data quality journey doesn't end with recommendations; it must close the loop by providing feedback to the system to improve the overall data quality.
Read about the 12-point formula that helps assess the quality of a data set from its raw form to the final output.

The data challenge

Data is a key growth enabler in today's fast-paced digital transformation space. In the current digital era, information is accessible to all, and data is pumped into systems daily. Every hour, we generate Petabytes of data. This incredible amount of available data brings considerable challenges in maintaining data quality. "If you can't measure it, you can't manage it," is an oft-quoted admonition on data management attributed to the late William Edwards Deming, an American statistician known as the guru of quality control.

Data is the most impactful lever for an organization's growth and transformation strategy and a catalyst for the enterprise to become future-ready regarding its operational maturity. However, just having data is not enough; you need quality data. Let's see why.

Assessing data

Assessing data quality is important. This assessment needs to consider the impact of technical attributes of data such as consistency, accuracy, completeness, timeliness, relevance, and the business implication for the industry to which the organization belongs. Therefore, a holistic view of data is required to arrive at recommendations to improve data quality. This amalgamation of technical and business attributes of data can be used to arrive at a data quality score.

Why is it required?

In an enterprise ecosystem, data tells a story. The story of the lifecycle of a tuple (data record), the tuple or the record undergoes multiple transformations and enhancements, producing information that tells the user of the purpose of this dataset in the ERP system.

A classic example is the retail ERP system. It maintains a transactional record of the stock in hand of an SKU (Stock Keeping Unit) at any given instance. This information, in turn, helps in planning the replenishment of correct levels of stock. It allows buyers in on-time procurement, helps merchandisers set the right price for their inventory, and enables the supply chain manager to move goods across locations timely.

If we listen closely, the SKU data story helps answer questions like: What is the stock at the store at this point? When was the inventory updated? Where is it being kept, and how is the stock moving?

But what happens if the data parameters in this journey get corrupted? Will the data narrate the 'correct' story? Will the planners interpret it correctly, and will they be able to plan the inventory accurately?

Our research found that the existing data quality assessment tools talk only about the technical parameters. There is a need to view data holistically, incorporating both technical and business aspects.

We also see that the data quality journey doesn't end with recommendations. It has to be a closed loop by providing feedback to the system, thereby improving the overall data quality.

How it works

Based on our assessment of looking at data holistically, we arrived at a set of parameters to interpret data quality from two dimensions -- technical attributes and business implications.

Use industry-led rules that focus not only on the five vital technical parameters (Consistency, Accuracy, Completeness, Timeliness, and Relevance) but also on Business Rule Violations (BRV)
The BRV measures the degree to which the given data deviates from the rule and helps in determining the business impact on data quality
Use the novel approach of assigning applicability (TAI - Technical Applicability Index) of a rule to the relevant data quality attributes (e.g., consistency, accuracy, etc.)
The Data Quality Score (DQS) is derived using TAI and BRV for the given rule

The algorithm

We have devised a 12-point formula (see table below) for our data quality assessment method, which we call the Data Quality Assessment and Recommendations Tool (DQART).

A 12-point formula for data quality assessment

An infographic detailing a 12-point formula for data quality assessment which begins with baselining current data, collected through a user survey. The tool next identifies the appropriate business process based on the results of the survey, moving on to identifying the sub-process of the business process chosen, and picking the rules based on the process and sub-process. Next, the tool determines the applicable data quality specific to each attribute, like consistency or accuracy. Then comes assigning weightage to the technical attributes and configuring the technical applicability index (TAI) formula, after which the tool will execute the database query for violation of corresponding business rules;, compute business rule violations (BRV) which calculates the percentage of records that deviated from the business rule; assign weights to TAI and BRV through user input in the tool; compute a data quality score based on TAI and BRV for the rule;, depict a data quality health band based on TAI and BRV by assigning red, amber, or green color codes; and calculate the minimum to maximum score. Minimum data quality score (DQS) points to good data health. In the last step of the formula, the tool generates a detailed report on the data quality and the data health of the organization.

KEY FINDINGS

To test our 12-point DQART formula, we applied it to the merchandising system of a leading retailer. The objective of the exercise was to investigate the data quality of the enterprise. Though the use case here is specific to the retail industry, the foundational precepts of the formula are also relevant to other sectors.

Retail organizations depend on multiple systems to deliver goods from the supplier to the customer. Items are the foundational building block for a retailer. Therefore, data quality principles must be rigorously applied while creating an item. For instance, often, item descriptions carry special characters (%, ^, &,*, $, #), which violate the item creation protocol. Such With deviations could have a ripple effect across the system and cause delays in data processing, impacting customer experience and reporting output of the decision support systems.

Retailers generally choose uniform prices across multiple differentiators (color, size, etc.). Our research shows that uniform pricing implementation across differentiators gets compromised during item maintenance. It impacts the data integrity of an item with multiple prices across different variants and affects the overall customer experience.

Figure 1: Reference data quality Inferences for Item functional area

Reference data quality inferences of a retailer's merchandising system

Two bar charts illustrating the results of the implementation of the 12-point data quality assessment and recommendations tool formula on a retailer's merchandising system. The results show that under kids wear, the department is creating items with special characters because of inconsistent pricing. This violates the business rule of setting consistent prices across multiple differentiators.

Recommendations

From the above figure, we can conclude that the 'kids wear' department is creating items with 'special characters' and violating a business rule of setting consistent prices across multiple differentiators. To mitigate any future deviations, the IT team should advise the department to avoid using special characters and do the same across variants. Alternatively, the IT team should eliminate all special symbols during the initial upload/item creation process.

CONCLUSION

The journey of a data set is fascinating. We mapped the journey of real data sets from their raw form to the quality output, giving us multiple insights into how data quality can be improved.

We measured data quality not only on accuracy, consistency, integrity, timeliness, and relevance but also on the importance of business violations of the multiple rules applied to the data set. Our 12-point formula drills down to the minutest details of the data across the five critical technical attributes and put in a flavor of the business imperative of data. The formula provides a detailed series of steps to assess the data quality from a holistic point of view. The recommendations highlight the deviations across multiple data attributes and provide information on the leading practices to be followed. This tool can be deployed on-premises and on the cloud. An AI/ML framework to implement the data recommendations and create a closed-loop system, thereby pointing out the nuances and correcting them.

About the authors

Dinesh Mohata

Dinesh is a Senior Retail Industry Advisor and a Data Evangelist. Dinesh has worked extensively in client implementation projects with large footprints encompassing end-to-end Retail Processes focusing on Data Quality Attributes.

Monodip Chakravarty

Monodip is a Data Quality Expert and has worked with leading retailers across the globe. Monodip is passionate about product development and innovation. Monodip has been instrumental in guiding multiple retailers in their transformation journey.

Agniv Chakraborty

Agniv is a Data Quality Analyst with comprehensive expertise in building Data Quality Dashboard's. He is responsible for ensuring pre-built data quality in customer solutions and internal assets.

Perpetually Adaptive Enterprise

About Us

TCS Insights

Upcoming events

Recent recognitions

Want to be a global change-maker? Join our team.

Find the latest news about TCS

Recent Press Releases

Recent News

Perpetually Adaptive Enterprise

About Us

TCS Insights

Upcoming events

Recent recognitions

Want to be a global change-maker? Join our team.

Find the latest news about TCS

Recent Press Releases

Recent News

Why data quality is key to an intelligent enterprise

Industry

Highlights

In this article

The data challenge

Assessing data

Why is it required?

How it works

The algorithm

KEY FINDINGS

Recommendations

CONCLUSION

About the authors

Dinesh Mohata

Monodip Chakravarty

Agniv Chakraborty

Transformation starts here

Find out more

Perpetually Adaptive Enterprise

About Us

TCS Insights

Upcoming events

Recent recognitions

Want to be a global change-maker? Join our team.

Find the latest news about TCS

Recent Press Releases

Recent News

Industry

Highlights

In this article

The data challenge

Assessing data

Why is it required?

How it works

The algorithm

KEY FINDINGS

Recommendations

CONCLUSION

About the authors

Dinesh Mohata

Monodip Chakravarty

Agniv Chakraborty

Related reading

Transformation starts here

Find out more

Accessibility Adjustments