In the recent past, the term `big-data’ has acquired buzz-word status in the IT industry. The data management stack is undergoing a transition from traditional RDBs to parallel, map-reduce-based processing, distributed file systems and no-SQL databases. At the same time, the enterprise world is curiously exploring the potential business use-cases where deep analytics using the new stack and emerging data sources can add new insight and value. The vision of the Data Analytics and Information Fusion group is to explore such business use cases, solve the analytics problems that underlie them, as well as develop efficient algorithms that exploit the new big-data stack and emerging data sources. Thus, the characteristics of the problems we focus on are that they:
- Exploit new data sources, such as social media (which did not exist earlier) or data-streams from sensors (which earlier could not be stored) or geo-spatial streams from mobile users (which both did not exist, nor are easy to store)
- Exploit the big-data technology stack, i.e., map-reduce no-SQL, and/or parallel stream processing
- Extend and apply state-of-the-art KDD and IR algorithms to these problems
As a consequence of the growth and popularity of social media such as Twitter, news events of even minor or highly local import are often reported here by reporters as well as the general public. Similarly, conversations in specialized blogs and discussion forums often mention specific faults or difficulties being faced by consumers of products or services. Goal of this research project is to detect interesting events occurring anywhere in the world from the publicly available social media data, to attach semantics to the detected events, and to raise alerts against the events which can potentially affect enterprise operations. Such work is being explored as a tool for risk management of supply chains and other sensitive operations of an enterprise, and for situational awareness of highly active fields of business operations such as marketing and consumer feedback.
Fusion Data Warehouse
With increasing volumes of ‘big data’ becoming available from external sources, coupled with the desire to process and correlate disparate data sources in real-time, the traditional enterprise-business-intelligence technology stack is facing challenges. Big data technologies, exemplified by the MapReduce paradigm and the Hadoop platform promise a way out. Goal of this research project is to establish MapReduce based architecture for efficient correlation of disparate and incongruous data sources in real-time. The harmonized data is then used for generating interesting business insights. This is found to be useful in a range of business domains, e.g., fusion of social media data and survey-data with data from transactional systems, consolidation of KPIs from across lines of businesses etc.
With increased automation of business processes through departmental software applications, even more information about business entities is captured in the systems than ever. However, the information is distributed across departmental applications, often difficult to integrate due to geographical or systems related barriers. Goal of this research project is to perform correlation of information or documents pertaining to specific entities and to discover relationships based on semantics of those entities, often referred to as Entity Resolution and Relationship discovery, using scalable and efficient parallel algorithms using Map-Reduce and the Big-Data technology stack, as well as techniques such as locality sensitive hashing. More specifically we focus on entities of enterprise interest with a view to provide single view of important business entities such as suppliers, customers, vendors and their interrelationships, which could be resilient to M&A. Such single view of entities can be used for operational ease and for making business decisions such as assessment of dependence on a specific entity in large enterprises.
Demand Signal Refinement
Forecasting future demand is a key business need across industries. Demand forecasts are used for planning procurements and making manufacturing plans. Source of information for assessment of market demand has traditionally been sales data and marketing expense data. Consequently, the demand signal generated from such data lacks fine-grained seasonal and geographical variations due to insufficient inputs. Goal of this project is to refine the demand signals with respect to temporal and spatial dimensions, along with understandable causes for such variations, by exploring alternate means of input data gleaned from social media, mobile users, financial transactions, etc. The refined demand signal could be useful for marketing applications as well as in the supply-chain: For example, manufacturers would be able to better plan their outlay of trade dollars and can target promotions in stores where demand is likely to rise the most from such investments.
We aim to efficiently discover occurrences of patterns in and across multiple very large data-streams. Normally, knowledge of such patterns is usually derived from domain experience, or the analysis of small samples, which may not be exhaustive; important patterns can be missed. The goal of this research project is to perform data-mining on voluminous data-streams, which include time-series such as from in-car or oil-rig sensors, or event-streams from social media. The goal is to extract knowledge in the form of rules, correlations and other interesting patterns. The resulting data can be used to discover root-causes for anomalous as well as aid in the design of testing scenarios by better reproducing of actual usage patterns in controlled environments.
Geography of Social Networks
Understanding patterns of social interaction has always been the holy grail of marketing divisions. Proliferation of online social media across geographies has not only increased social interaction among people across geographical and ethnic boundaries, but also digitally captured and made this information publicly available. The goal of this project is to establish better understanding of online social interactions and discover newer patterns through fusion of geographical and demographic information, to help our customers devise newer means and strategies of marketing their products and services. This capability can therefore be useful for social commerce and online marketing.
The group is headed by Dr. Gautam Shroff