Research

Speech and Natural Language

Next-generation IT systems should enable easy and natural modes of interaction for its wide-spectrum of users – from the tech-savvy urban gadget holders to the not-so tech-savvy rural folks and retired urban consumer.

The channels of interaction are multi-modal and diverse, covering a wide gamut of mechanisms such as Speech, SMS, Chatting, Messaging, Multi-lingual inputs and Discourses. It is in this context that research in Speech and Natural Language Processing assumes prime importance. The vision of the group is to empower masses to access and derive usable information effortlessly and naturally from enterprise-class systems. The focus of the group is on exploring, developing and using cutting-edge technology to question the status quo and overcome existing challenges and build human usable interaction systems.

Specifically the group believes in building natural language transaction systems that can interact with complex information systems on one side without hampering the convenience of  use by the public at large.
 
With  the increase of automation of IT systems on the one hand and the rise in the  mobility of people, it is but natural to expect  the next-gen IT systems  provide people with easy and effective modes of interaction irrespective of where they are geographically located.  The actual data may be typically spread over various systems and repositories in both structured (databases, data- warehouses, and business applications), and unstructured (documents, reports, notes, proposals, resumes etc.) form. Inferring usable information from such distributed data sources can be laborious and time consuming, especially with the available modes of interaction. Speech and Natural language are the natural modes of interaction in such complex situations. Our work explores and  targets different consumers (individual citizens, employees, management, casual users, customers) in an enterprise (an enterprise can be any logical or dynamic entity for a purpose such as government, international bodies, corporate, public systems, social systems) with diverse kinds of information need in different languages (Indian / others). Natural language speech enabled interactions have to deal with different languages, accents, background noise, context-selection, multi-language recognition when it comes to speech processing; while Natural language processing systems have to deal with issues such as semantic processing of conversations, intent-extraction, correlation of unstructured and structured data, multi-lingual texts and cross-language translation.

Specific to spoken natural conversations between individuals, the group is focused on exploiting the  rich non-linguistic, language independent information hidden in them. This hidden information when mined gives immense insights into several aspects of conversations. The aim is to derive usable and actionable information automatically by analyzing a very large number of natural speech conversations and then processing them through deep natural language parsing mechanisms.

A sample of current interests include the following:

  • Robust speech recognition for different languages that can handle noise, varied accent and other variations smoothly. Specifically: 
    • Understanding the effect of noise on speech features, 
    • Dynamic changing acoustic models based on noise statistics, 
    • Building  context specific language model to enable domain specific speech recognition.
  • Voice and speech analysis to identify gender, speaker, language, emotions, etc. over a large input set.
  • Deriving usable information hidden in spoken conversations for enabling actionable analytics. Analyzing call center conversations from non and para linguistic perspective to gain analytical insights into the conversations over and above what can be achieved by voice analytics based on speech transcription.
  • Building speech corpus to enable automatic speech recognition for resource deficient Indian languages by exploiting Internet resources.
  • Parsing chunks of text to identify the tasks and intent conveyed in them at a semantic level, together with identifying the relevant entities, objects, and relationships in the domain.
  • Providing multi-lingual interfaces for systems.
  • Parsing and translating documents, reports and system responses in order to enable information correlation and extraction across diverse language resources.
  • Correlation of structured and unstructured data to extract meaningful latent relations among them.
  • The group has several publications and patents to its credit. One of our patented tool assists call center agents in real-time to speak at the right speed and has been awarded the Aegis Graham Bell Awards in the category Innovation in Customer Care.

The group is headed by the following:

Reach Us.

Share