BUSINESS AND TECHNOLOGY INSIGHTS

Data Discovery– the Cornerstone of Test Data Management

 
April 5, 2018

For sustainable growth in today’s competitive markets, businesses must not only be agile, but also adopt latest trends and technology. One of the key factors of agility is ‘optimal test coverage’ of quality engineering (QE) and testing, which in turn, is enabled by test data. Test data management (TDM) – the science of managing data for testing – is therefore, a critical success factor for agile.

Take for instance, the case of a large bank that was migrating its manual and archaic test data provisioning process to an automated tool-based solution. It covered an entire application cluster involving several heterogeneous but interrelated data sources. The TCS team identified appropriate data requirements and matched the requirements to test cases. The next step was data discovery, comprising production-sourced and auto-generated synthetic data. The team used an automated solution to identify metadata patterns and gaps in coverage to create a meaningful TDM strategy that played a critical role in ensuring a smooth data migration.

Despite being a critical enabler, TDM often fails to catch attention of QE leaders. In most cases, especially in large organizations, TDM is characterized by common concerns around regulatory compliance, referential integrity across several databases hosted on different technologies, tacit data forcing dependence on SMEs, right data coverage, and the need for test data to be constantly in sync with production.

The role of data discovery

Much has been said about the need and importance of test data. But what’s important for TDM’s success? Data discovery could just be the right solution to complex data-related problems, but agile project teams often do not pay enough attention to it. In the TDM-agile scenario, data discovery must take centre stage. Besides understanding data relationships that are deeply rooted in databases and beyond, data discovery must also consider relationships that are hidden within the data, but not described in detail via schemas. While most businesses are keen on adopting TDM best practices for a simpler data ecosystem, they often get confused about how and where to start their data discovery journey. Discovering relevant data continues to be the biggest challenge for fully-tested agile deliveries.

For realistic and practical data discovery solutions, it is important to understand the overall data landscape. Test data issues are unique and a there is no one-size-fits-all solution. For instance, bulk of synthetic test data for high volume e-commerce sites can be automated. But similar automation may not work for legacy systems, packaged applications or data-warehouse applications that require large-volume, realistic test data. With little scope for generating synthetic data from scratch, these use cases rely heavily on production data. While data discovery is imperative for generating meaningful subsets from the production data, each use case requires a different approach.

The earlier banking example, along with use cases from other domains such as healthcare, clearly establishes the need and importance for discovering hidden data relationships. Failure to establish detailed and logical data relationships, especially in the healthcare domain, could result in incorrect data masking routines, which in turn could spell disasters such as administration of wrong drugs to patients.

Choosing the right tool

A key aspect of a successful data discovery process is the selection of the right tool. While choosing the tool, the project team should ensure that it has easy and efficient connectivity to data sources, and enables automated data profiling for identified sources, auto-identification of sensitive elements for anonymization, generation of data subsets that are representative of the overall dataset, and has the ability to accelerate returns on investment.

As we move towards higher levels of agility with monthly to weekly and weekly to daily production deployments, more and more components get frequently added, modified and removed from the agile delivery landscape. These accelerated change frequencies mandate quick and streamlined provisioning of accurate datasets for repeated testing – which in turn, demands robust data governance. With good understanding of data consumed by applications, it is easier for teams to generate the right test data and ensure adequate testing. Whether you are solving a data problem, bridging compliance gaps or analysing systems for upscale, the starting point is always a thorough data discovery exercise.

 

Kalindi Sinha is a test data consultant with the test data management center of excellence at Tata Consultancy Services. She specializes in consulting and strategic solutioning for TDM tool implementation. With 8 years of experience in QE and production support, she has worked with clients from various verticals such as hi-tech, banking, retail and insurance across geographies. She has extensive solutioning experience which has helped in providing successful consulting for various customers.