People speaking connected in a network


Negative is the New Positive: NLP and the Pandemic

June 3, 2020

The COVID-19 global contagion has brought to the fore the benevolent prowess of AI technologies. Scientists and technologists are working hand in hand with the frontline medical community as a repertoire of AI technologies such as computer vision, natural language processing (NLP), analytics-driven drug and vaccine discovery, geo-sensing, and data science are playing key roles in disease detection, monitoring and controlling the spread of infection, and in drug discovery. This blog looks at the role of natural language technologies in crisis mitigation. It also sheds light on challenges and future possibilities.

The Intelligent Use of Time

AI-powered multilingual chatbots have played a key role in disseminating pandemic-related information and several enterprises, from global tech giants to regional start-ups, have offered their services in this space.

India, too, is seeing large scale deployment of multilingual chatbots that are in use to collect data on symptoms through guided conversations with patients, and to suggest appropriate next steps, including fixing doctor appointments. This has freed up time for healthcare professionals to deal with tasks that require human intervention.

This speedy development and deployment of the chatbots has proved that NLP technologies have reached a level of maturity to perform repetitive meaningful tasks albeit within a restricted context. Deep learning-based language models, which are at the heart of the success of multi-lingual conversation systems and chatbots, can deliver much more. Though most current applications employ deep learning for content interpretation, it is also possible to use it to generate responses to user queries in natural language. This feature can successfully convert a chatbot to a conversation agent, and promises to become available soon. Needless to add, these developments must take place within environments that have mandate data privacy and security.   Although there has been a surge in the use of keyboard operated chatbots, which have low network resources, voice-enabled chatbots with wider reach are more powerful and useful.  

A Widening Area of Research

The pandemic has given rise to a new set of consumers of natural language technology such as doctors, virologists, epidemiologists and other healthcare professionals. But there are challenges. The goal is to offer reliable language models to help easy assimilation of content through intelligent implicit search, automatic connection of dots across sources, visualization, question-answering, and innovative content summarization. The best possible outcome would be to have a layer of predictive technology work on the information extracted from multiple sources and help in new knowledge discovery. Though not completely solved, there is reasonable progress in this area, thanks to content-embedding models that have been created using deep recurrent networks. These models, also referred to as “thought vectors”, can capture the meaning of content far more comprehensively than traditional statistical models.  Several of these models are available to NLP practitioners.  It is now possible to solve several low-level NLP tasks like tagging parts of speech or named entity detection, which help in recognizing the names of medicines, genes, currency, experimental methods and so on with far more accuracy. This in turn can ensure that the upstream predictive technology or knowledge-based reasoning mechanism generates more accurate results.

The more challenging problems that researchers are looking at include identifying causal relations, validating inferences, verifying the truth of claims made, performing quality assessment, detecting contradictions and so on.     

The Challenge of Grammar and Nuance

COVID-19 has also thrown some other unique challenges at NLP researchers. For instance, the use of the terms negative and positive. In typical human language perception – positive denotes “good”. However, one knows that a surge in COVID positive cases is bad, while a reduction in numbers is the new good. This is an appropriate example of how language models need to be retrained. Additionally, current language models are good at capturing the subjects of discussion, but not quite adept at coding the finer aspects of text that discuss “why” or “how”. Understanding sentiments plays a key role in strategizing for business and if anything, decision-making in a post-pandemic world will be a more knowledge-driven activity than ever before. There is a need for intelligent search – search systems that don’t just look for surface level matching of content with query but can interpret implicit and hidden intent of user from the query and fetch content accordingly.  

As business leaders try to extrapolate what the new world will look like after it emerges from the present crisis, they do stand to benefit from tools that can perform intelligent content-crunching from the huge volumes of text that pours in from analysts, scientists, government heads, social media and so on.  Text-driven reasoning aims to augment predictive technologies that deal only with numbers with textual information. Text often contains information that can explain numbers. Getting this bit of additional information into a reasoning mechanism is crucial to make it intelligent. Accuracy, however, is key for these applications and development of sophisticated reasoning models are under development. Presently, companies like HealthMap and Cobwebs Technologies are tracking mentions of the virus on the internet, but more remains to be done.

Continually Evolving

Uncontrolled content generation and distribution mechanisms have also given rise to high volumes of content that are dubious in nature, outright fake or generated with a malicious intent. Understandably, one of the key research areas of AI is centered around detecting fake content.

As NLP capabilities mature, the time is right to consider leveraging their benefits in business and trade. Innovative applications can be thought of by hooking conversation agents to almost all kinds of monitoring and analytics systems. Care-giving systems are just one instance of these. One has to acknowledge though that products powered by these technologies will be in a permanently “evolving” stage. Every new phenomenon is going to contribute new words to the vocabulary, and systems will have to be trained to recognize these.  Evolutionary learning has to be at the core of these new tools, with an ecosystem that supports a continual contribution from the human mind. 

Lipika is a chief scientist at TCS Research and Innovation and heads analytics and insights practices. Lipika holds a PhD in computer science and engineering from IIT Kharagpur. Her research interests are in the areas of NLP, text and data mining, machine learning, and semantic search.