Banner Image

Business and Technology Insights

Best Practices for Big Results with Big Data Testing

 
June 7, 2016

The digitization era has offered data analytics a new role. It is no longer just another reporting function, but a key accelerator that drives business intelligence. Greater precision can be achieved when business objectives are clearly understood and possible outcomes are thoroughly studied and analyzed.

Business intelligence and predictive analytics do not happen in isolation, with data coming only from a single internal system. In an always-connected digital world, there exist multiple external sources of data. For instance, customer sentiment is an essential input to corporate strategies. Hidden in large volumes of social media and public data, this customer sentiment needs to be unearthed. But how?

Given the rapid pace at which such data gets generated, most businesses start with infrastructure and automation adequate bandwidth and the right set of tools, comprising a mix of open source and commercial solutions – to drive improved analytics and business intelligence. With its distributed file system, open source architecture, and ability to scale, Apaches Hadoop a distributed-computing open source framework for processing large data sets is the natural and most popular choice for companies embarking on their Big Data analytics journey.

A typical agile work model, supported by a Hadoop ecosystem, involves analyzing ETL jobs (or extract, transform and load jobs), transforming data sets, validating them across different layers of data flow and migrating data across multiple databases. Understanding the flow of data against the well-defined business layers in a whole new enterprise warehouse model is typically challenging to handle. As a result, data scientists and analysts prefer automating this workflow and addressing complexities in data transformation and validation by developing Hadoop (Pig) scripts that validate and process data.

For the resulting insights to be useful, quality assurance (QA) and testing teams and the testing process itself must be mature enough to understand and analyze challenges in real time, surpass several layers of data transfers, pass quality checks, and facilitate transformation changes with proper validation and filtering. To meet this challenge, QA and testing professionals must rise above their current roles, and look beyond mere test case execution. Working in a collaborative mode with developers, they must plan each iteration for aggregate and incremental data loads across various testing phases such as data preparation, integration, and validation. Its important that such validation happens early in the cycle, during work model initiation, so that data model changes can be dealt with precision and minimal impact. In short, testers must become data scientists, capable of handling data structure layouts, processes, and data loads.

Heres a quick to-do list for the tester turned data scientist:

  1. Avoid the sampling approach. It may appear easy and scientific, but is risky. It’s better to plan load coverage at the outset, and consider deployment of automation tools to ingress data across various layers.
  2. Use predictive analytics to determine customer interests and needs. Derive patterns and learning mechanisms from drill-down charts and aggregate data. Ensure correct behavior of the data model by incorporating alerts and analytics with predefined result data sets.
  3. Identify KPIs and a set of validation dashboards after careful analysis of algorithms and computational logic.
  4. Ensure right-time incorporation of changes in reporting standards and requirements. This calls for continuous collaboration and discussions with stakeholders. Associate the changes with the metadata model as well.
  5. Deploy adequate configuration management tools and processes.

While these are process improvements at the micro level, the big picture requires best-in-class testing platforms and customizable testing frameworks, and a changed management mindset. For instance, a leading US bank was able to successfully improve their portfolio management and customer retention with a comprehensive data testing strategy. Insights drawn from the customer transaction data had revealed a relatively less positive association with the bank, and the need for improved targeted marketing through campaigns and promotional offers. In an unprecedented move at this organization, testers in their roles as data scientists filtered inefficiencies in business governance, and thus the QA function laid the foundation for a strong data warehouse structure. QA and testing best practices and effective score handling techniques enabled the bank to initiate proactive measures to limit customer churn, improve customer retention and increase the customer base while delivering personalized interactions.

In yet another success story, at a telecom company, data testing practices helped improve decision making, based on patterns, insights, and key performance indicators from massive sets of aggregated data. With use cases focused on user journeys – from activation to product purchase, QA interventions enriched the aggregated warehouse data with proper insights, fusing it with improved dimensions and metrics. The improved governance and channel experience resulted in enhanced customer retention.

In both these cases, the companies chose to test waters with a set of relevant choices, in place of a big bang dive into a deeper data world.

In conclusion, there are two takeaways from these implementations. The first being, misconception, which must be cleared if you thought Big Data testing was just for you to gain better insights and intelligence, think again. Big Data testing is also about improving your customers experience with your business, brand, and products. And second, and most relevant to data testing – start small, and then scale big.

Nisith Kumar Sahoo is a Big Data Consultant for the Niche Assurance Center of Excellence of TCS' Assurance Services Unit. His focus areas have been predominantly Big data in which he interacts with various customers across different verticals to offer them assurance insights in this field. He is continuously involved with the latest innovations and research around Big data. He has over 7.7 years of experience in IT and has worked with clients in Big data across IT domains involving Banking, Telecom, Retail, and Healthcare.