Banner Image

Next Gen CMI

Scaling Newer Heights with Content Analytics: Going from Unstructured Data to Actionable Intelligence

 
March 22, 2016

Data is no longer a one-off input or output entity, it is a fast growing continuous stream. As the volume of structured and unstructured data grows at an unimaginable pace, the need to analyze it and cull out business insights has become even more pronounced. While structured data is more computing friendly due to its machine comprehensibility and easily measurable nature, unstructured data is complex and fuzzy, making it harder to process and derive value from. But since unstructured data forms more than 80 percent of the consolidated global content across industries, we can’t really ignore it, can we?

So let’s introduce our savior then – let the drums roll for ‘content analytics’. Gone are the days when content analytics was merely a marketing tool that helped profile customers and fine-tune sales strategies. It can definitely do a lot more.

In our engagements with global enterprises, we’ve found that content analytics is being used to build superior, automated, intelligent applications across industries. The increasing volume of digital content and the emergence of cloud computing, Big Data, and machine learning have accelerated the evolution of this area. We believe that recent advancements in the field of content analytics can be largely attributed to Big Data processing and improvements in NLP, machine learning, and ontology engineering disciplines. Large-scale implementations of products such as IBM Watson, Google Knowledge Graph, and Apple’s Siri are gaining momentum. Content-centric enterprises like publishers, information service providers, analyst firms, and advertisers are also investing in Big Data and advanced content analytics technologies to improve operational efficiency and business competitiveness.

Okay, a few basics first – what does advanced content analytics comprise?

Content mining: Large-scale information extraction (IE) from unstructured heterogeneous content, predominantly text, but also, graphics, audios, and videos. This stage primarily deals with entity extraction, relationship extraction, topics modelling, clustering, classification, and summarization methods.

Content discovery:  It is here that the ‘real’ business intelligence is derived by conducting contextual searches, administering questionnaires, implementing topic-driven content navigation, and more. A more advanced capability that is often required in this phase is inference chaining – the process of synthesizing new information from a set of already discovered information.

Here are some scenarios where content analytics can come in handy:

  • Assessing product or brand perception and consumer sentiment analysis using social media feeds
  • Linking stock price movement to news articles and a company’s
  • Conducting algorithmic scoring of essays and subjective questions as part of students’ exam-response evaluation
  • Preempting acts of terrorism by mining newswires, you-tube videos, emails, and more
  • Identifying law and order violations and potential criminals by automated analysis of CCTV video footages

Aside of these use cases, there are many more areas where content analytics can be applied, which will in turn lead to the emergence of new business models.

As far as the future is concerned, we believe that there will be a significant increase in adoption of semi-supervised and unsupervised machine learning due to minimal training overheads and better portability. Such capabilities could be offered as cloud based commodity services, enabling organizations to quickly prototype and build innovative products.

Advanced content analytics, a key ingredient of artificial intelligence systems, is a complex subject and is being actively researched by academicians and enterprises alike. There are some challenges that need to be addressed, such as multi-lingual diversity, ambiguity resolution, sarcasm detection, error comprehension and approximation, intercepting assumptions, and inference chaining. Regulatory and content privacy related challenges can add to the complexity of this equation.

Despite all this, content analytics holds great promise for the future of business, especially the media and information space. With everyone having tasted the early benefits of analytics driven insights, the need for ‘intelligence’ is only going to rise, driving everyone to innovate and compete in this space.

Tomal Deb is a Solution Architect with the Media and Information Services (MIS) business unit at Tata Consultancy Services (TCS). He has over 11 years of experience across areas like semantic enrichment and search, machine learning, text analytics, content management, and Big Data. Tomal has led solution architecture and deployment programs for several global majors in the publishing, information services, and education space. He has also been closely involved in the conceptualization and development of TCS’ proprietary platforms and offerings in the content technology space.