Corporate Themes

Training Your Systems to Automate Data Extraction: A Gradual Learning Approach

 
January 16, 2018

Although data extraction is typically classified as a ’non-core’ activity for any domain intensive business process, the extracted information is often critical to downstream business processes. Extraction of relevant data from unstructured formats and multiple disparate input channels remains a key challenge. And while automation can streamline data extraction in many cases, businesses can often be unrealistic about the level of automation they can achieve in this context. In uncontrolled environments, for instance the environment after an automation project goes live, data quality can be severely compromised if data is not being verified and authenticated. So how do organizations proceed with the automation of their data extraction processes?

Balancing cost of automation and expected savings is key to success

In the case of printed documents in a structured format, automated extraction of relevant data is quite easy. Handwritten documents on paper that can be read using Intelligent Character Reader (ICR) are also automation friendly. But in reality, the proportion of structured data sources like these may be close to half. For example, our research found that of all the data that insurers deal with, over 40% is either unstructured or semi-structured in nature. One of the objectives of automation in this context could be to reduce the underwriting effort and costs by scrutinizing and validating documents such as pay stubs and bank statements before sending them to the underwriters.

The challenge lies in the fact that these documents exist in different formats and emanate from different sources, rendering complete automation infeasible. Uncontrolled and decentralized scanning adds to the complexity of the process and impacts the quality of scanned documents. It is therefore not viable to completely do away with human intervention in extracting relevant data. It is, however, possible to train systems to replace human intervention to a great extent. This requires considerable training of automated systems that leverage machine learning and natural language processing technologies by feeding them a variety of samples over time.

The key is to assess the trade-off between the cost of automation and the expected savings, enabling the business to choose between a fully automated versus an automation-aided solution. For instance, a fully automated solution is ideal for extracting information from structured documents in a controlled environment, while aided automation is a better choice for unstructured documents in an uncontrolled environment. This means the efficient categorization of documents into structured and unstructured is key to the successful implementation of data extraction automation.

An incremental approach to automation is the best path to value

It is also unrealistic to expect straight through processing without human intervention from day one of solution implementation. The most appropriate approach would be to choose an automation aided tool that initially reduces manual extraction effort, while allowing the system to be trained with different documents. Allowing for In a typical solution implementation xperience, we have found that automation in data extraction can gradually reduce manual effort by up to 70%

It’s important to remember that a one-size-fits-all approach does not work well when it comes to data extraction automation. Implementing a strategy whereby every document is handled based on its digital and structural merits is the only way businesses can benefit from automation techniques. In essence, the extent of automation will depend on the type of document, and human intervention for extraction and review might still be needed in certain instances.  The good news is that implementing the solution using a common, customizable framework can help businesses manage various types of documents with different degrees of automation in a centralized manner.

Share your experiences in automating data extraction with us in the comments section!

Swayambhu Mandal is a Business Consultant with the Technology & Innovation (T&I) group in the Business Process Services (BPS) unit at Tata Consultancy Services (TCS). He has over 14 years of experience in consulting for designing and implementing digitization and workflow process solutions, and has led key digitization initiatives within TCS. Mandal holds a Bachelor’s Degree in Electronics Engineering from Nagpur University, and an MBA from ICFAI Business School, Hyderabad.