Publications Archive 2012


Select Research Publications from the Research community for the year 2012.




Road condition monitoring and alert application: Using in-vehicle Smartphone as Internet-connected sensor

Pervasive Computing and Communications Workshops (PERCOM Workshops), 2012
Authors: A Ghose, P Biswas, C Bhaumik, M Sharma, A Pal, A Jha,


The proposal describes a road condition monitoring and alert application using the in-vehicle Smartphone as connected sensors, which are connected to an Internet-of-Things platform over the Internet. In addition to providing a generic Internet-of-Things based platform, the proposed solution brings in novel energy-efficient phone-orientation-agnostic accelerometer analytics in phone, reduces the data volume that needs be communicated between phone and the back-end over Internet, brings in multi-user fusion concepts to create authentic road condition maps and addresses privacy concerns for the phone user for sharing the required data.

Read more

Classification of Metagenomic Sequences: Methods and Challenges

Briefings in Bioinformatics, September 8, PMID: 22962338 (2012)
Authors: Sharmila S Mande, Monzoorul Haque Mohammed, Tarini Shankar Ghosh


Characterizing the taxonomic diversity of microbial communities is one of the primary objectives of metagenomic studies. Taxonomic analysis of microbial communities, a process referred to as binning, is challenging for the following reasons. Primarily, query sequences originating from the genomes of most microbes in an environmental sample lack taxonomically related sequences in existing reference databases. This absence of a taxonomic context makes binning a very challenging task. Limitations of current sequencing platforms, with respect to short read lengths and sequencing errors/artifacts, are also key factors that determine the overall binning efficiency. Furthermore, the sheer volume of metagenomic datasets also demands highly efficient algorithms that can operate within reasonable requirements of compute power. This review discusses the premise, methodologies, advantages, limitations and challenges of various methods available for binning of metagenomic datasets obtained using the shotgun sequencing approach. Various parameters as well as strategies used for evaluating binning efficiency are then reviewed.

Read more


DELIMINATE – A fast and efficient method for loss-less compression of genomic sequences
Bioinformatics, 8 (19) 2527-2529 (2012)5.
Authors:Monzoorul Haque Mohammed, Anirban Dutta, Tungadri Bose, Sudha Chadaram and Sharmila S Mande

An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma.

Read more


BIND – An algorithm for loss-less compression of nucleotide sequence data
J.Biosciences, 37 (4), 785–789 (2012)
Authors: Tungadri Bose, Monzoorul Haque Mohammed, Anirban Dutta and Sharmila S Mande


Recent advances in DNA sequencing technologies have enabled the current generation of life science researchers to probe deeper into the genomic blueprint. The amount of data generated by these technologies has been increasing exponentially since the last decade. Storage, archival and dissemination of such huge data sets require efficient solutions, both from the hardware as well as software perspective. The present paper describes BIND – an algorithm specialized for compressing nucleotide sequence data. By adopting a unique ‘block-length’ encoding for representing binary data (as a key step), BIND achieves significant compression gains as compared to the widely used general purpose compression algorithms (gzip, bzip2 and lzma). Moreover, in contrast to implementations of existing specialized genomic compression approaches, the implementation of BIND is enabled to handle non-ATGC and lowercase characters. This makes BIND a loss-less compression approach that is suitable for practical use. More importantly, validation results of BIND (with real-world data sets) indicate reasonable speeds of compression and decompression that can be achieved with minimal processor/ memory usage.

Read more


TWARIT: An extremely rapid and efficient approach for phylogenetic classification of metagenomic sequences

Gene, 505 (2), 259-265 (2012)
Authors: Rachamalla Maheedhar Reddy, Monzoorul Haque Mohammed and Sharmila S Mande


Phylogenetic assignment of individual sequence reads to their respective taxa, referred to as ‘taxonomic binning’, constitutes a key step of metagenomic analysis. Existing binning methods have limitations either with respect to time or accuracy/specificity of binning. Given these limitations, development of a method that can bin vast amounts of metagenomic sequence data in a rapid, efficient and computationally inexpensive manner can profoundly influence metagenomic analysis in computational resource poor settings. We introduce TWARIT, a hybrid binning algorithm, that employs a combination of short-read alignment and composition-based signature sorting approaches to achieve rapid binning rates without compromising on binning accuracy and specificity. TWARIT is validated with simulated and real-world metagenomes and the results demonstrate significantly lower overall binning times compared to that of existing methods. Furthermore, the binning accuracy and specificity of TWARIT are observed to be comparable/superior to them.

Read more

To Top

C16S - A Hidden Markov Model based algorithm for taxonomic classification of 16S rRNA gene sequences

Genomics, 99 195–201 (2012)
Authors: Tarini Shankar Ghosh, Purnachander Gajjala, Monzoorul Haque Mohammed and Sharmila S Mande


Recent advances in high throughput sequencing technologies and concurrent refinements in 16S rDNA isolation techniques have facilitated the rapid extraction and sequencing of 16S rDNA content of microbial communities. The taxonomic affiliation of these 16S rDNA fragments is subsequently obtained using either BLAST-based or word frequency based approaches. However, the classification accuracy of such methods is observed to be limited in typical metagenomic scenarios, wherein a majority of organisms are hitherto unknown. In this study, we present a 16S rDNA classification algorithm, called C16S, that uses genus-specific Hidden Markov Models for taxonomic classification of 16S rDNA sequences. Results obtained using C16S have been compared with the widely used RDP classifier. The performance of C16S algorithm was observed to be consistently higher than the RDP classifier. In some scenarios, this increase in accuracy is as high as 34%.

Read more


Understanding Communication Signals during Mycobacterial Latency through Predicted Genome-wide Protein Interactions and Boolean Modeling

Authors: Shubhada R. Hegde, Hannah Rajasingh, Chandrani Das, Sharmila S Mande and Shekhar C. Mande
PLoS ONE, 7 (3): e33893 (2012)


About 90% of the people infected with Mycobacterium tuberculosis carry latent bacteria that are believed to get activated upon immune suppression. One of the fundamental challenges in the control of tuberculosis is therefore to understand molecular mechanisms involved in the onset of latency and/or reactivation. We have attempted to address this problem at the systems level by a combination of predicted functional protein:protein interactions, integration of functional interactions with large scale gene expression studies, predicted transcription regulatory network and finally simulations with a Boolean model of the network. Initially a prediction for genome-wide protein functional linkages was obtained based on genome-context methods using a Support Vector Machine. This set of protein functional linkages along with gene expression data of the available models of latency was employed to identify proteins involved in mediating switch signals during dormancy. We show that genes that are up and down regulated during dormancy are not only coordinately regulated under dormancy-like conditions but also under a variety of other experimental conditions. Their synchronized regulation indicates that they form a tightly regulated gene cluster and might form a latency-regulon. Conservation of these genes across bacterial species suggests a unique evolutionary history that might be associated with M. tuberculosis dormancy. Finally, simulations with a Boolean model based on the regulatory network with logical relationships derived from gene expression data reveals a bistable switch suggesting alternating latent and actively growing states. Our analysis based on the interaction network therefore reveals a potential model of M. tuberculosis latency.

Read more


A Technique for canceling impulse noise in images based on compressive sensing

Authors: B. S. Adiga, M. Girish Chandra, and S. Kadhe
Conference: 19th International Conference on Systems, Signals and Image Processing (IWSSIP), Vienna, Austria, April. 2012.


Background - Removing or reducing the impulse noise, also known as “salt and pepper” noise, is a classical problem studied in image processing. Various solutions have been proposed over the years, including the techniques based on the Bose-Chaudhuri-Hocquenghem (BCH) codes in the field of real numbers.

Results - This paper presents a novel technique based on Compressive Sensing (CS) for canceling impulse noise in images. The technique is pivoted around exploiting the strong connection between CS and error correction using the complex (or real) field codes. Even though the usage of real field Bose-Chaudhuri-Hocquenghem (BCH) codes for impulse noise cancellation in images is rather old, bringing the CS framework to address this classical problem in image processing provides a fresh perspective. Specifically, the paper investigates a CS-based product code based on partial Fourier matrices, with the requisite rows chosen based on a Perfect Difference Set (PDS) or consecutively.

Conclusions- The proposed code performs better than the real BCH codes in terms of correcting the different cases of noise for 3%, 4% and 5% levels, leading to better Peak Signal-to-Noise ratios (PSNRs). Further, the decoding algorithms for the proposed code are computationally efficient, more elegant and straightforward than the decoding algorithm adopted for real BCH codes.
MATLAB code is available upon request from the authors.

Read the full paper


On the enhancement and binarization of mobile captured vehicle identification for an embedded solution

Authors: T. Chattopadhyay, Ujjwal Bhattacharya, Bidyut B. Chaudhuri,
Journal: 10th IAPR International Workshop of Document Analysis, Australia, March 2012


Background - An embedded solution for automatic detection of Vehicle Identification Numbers (VIN) captured by a mobile camera has a number of real world applications. But the performance of available open source Optical Character Recognition (OCR) systems on VIN images captured by mobile phones is extremely poor because of the image quality affected by various noises. In a recent study of such images, we have observed that the performance of existing open source OCR systems can be improved by applying several image enhancement techniques on these images before sending them to the OCR engine.
Results - In this article, we have presented such a method that improves the recognition accuracy from 5.89% up to 82.3%.
Conclusion - A low-cost approach for VINs captured by mobile phone cameras is studied in this paper. While more sophisticated algorithms could improve the recognition, we have to restrict their uses because of the resource constraints of the selected hardware platform like low computing power of the system. We have used Tesseract as the recognition engine, but we plan to design a dedicated recognition engine by taking care of the font characteristics of the VINs that may be more accurate and less computation intensive. Moreover, we plan to develop a detailed system of retrieving the VIN from partial and erroneous results, as stated above. Thus the optimization, porting related issues of the system is also left as a future work.

Read the full paper


Ad-hoc Ride Sharing Application using Continuous SPARQL Queries

Authors: Debnath Mukherjee, Snehasis Banerjee, Prateep Misra
Journal: World Wide Web Conference (WWW), 2012 (Companion Volume)

Background - In the existing ride sharing scenario, the ride taker has to cope with uncertainties since the ride giver may be delayed or may not show up due to some exigencies. A solution to this problem is discussed in this paper. The solution framework is based on gathering information from multiple streams such as traffic status on the ride giver's routes, ride giver's GPS coordinates, ride giver requests and ride taker requests. Also, it maintains a list of alternative ride givers so as to almost guarantee a ride for the ride taker. The solution uses a SPARQL-based continuous query framework that is capable of sensing fast-changing real-time situation. It also has reasoning capabilities for handling ride taker's preferences. The paper introduces the concept of user-managed windows that is shown to be required for this solution. Finally we show that the performance of the application is enhanced by designing the application with short incremental queries. Incremental queries compute the impact of a single event on the state of the application.

Results - We show a comparison of two methods – firstly, where the matches between ride givers and ride takers are computed when each request arrives (this is called the “all rides query”), and secondly, where the incremental queries are run. Our results show that the incremental queries perform more than an order of magnitude faster than the “all rides query”.

Conclusions - In this paper, we have shown the design of an ad-hoc ride sharing application that reduces the uncertainties of the ride. Instead of considering all event types in the continuous query, we show that building the application state incrementally using a single event type boosts performance significantly. We have also introduced the important concepts of “knowledge packets” and “user managed windows”. We have shown the architecture of a stream reasoner framework that we have designed. Further work on improving performance of the stream reasoner is planned.

Read the full paper


Reliable Data Transmission in Sensor Networks Using Compressive Sensing and Real Expander Codes

Authors: Swanand Khade, Sandhya T, M. Girish Chandra and B.S.Adiga
Conference: National Conference on Communications (NCC), IIT Kharagpur, Jan. 2012

Background - Wireless Sensor Networks (WSNs) have to deal with constrained resources, e.g., power and computational capability. Limited power makes power saving in WSNs a critical issue. Since radio communication is the main cause of energy consumption, it is desirable to employ some form of source coding for data compression, so that sensor node lifetime can be extended. However, the computational burden incurred by conventional source coding techniques limits their usage in low-power embedded sensor networks. Since the compression discards redundant information, it becomes even more important to provide some protection against the data losses, but again with computationally efficient error correction schemes. When the locations of errors are known, we essentially have to handle erasures.
Results - In this paper, we propose to integrate the emerging framework of compressive sensing (CS) with real expander codes (RECs), coined as CS-REC, for robust data transmission. CS works as a computationally inexpensive data compression scheme, while RECs act as an elegant application layer erasure coding scheme. The benefits provided by RECs are twofold: one, RECs requires only few addition-subtraction operations over real numbers for encoding and decoding; two, they provide graceful degradation in recovery performance with increase in the number of erasures. Through elaborate simulations, we show that CS-REC can achieve the recovery performance close to the case where there is no data loss. Further, again via simulations, we demonstrate the usefulness of CS-REC for reliably transmitting image data in multimedia sensor networks.
Conclusion - The low computational complexity of CS as well as REC along with the flexibility obtained by decoupling sampling process form erasure coding, make CS-REC an excellent scheme for reliable data transmission over sensor networks which have limited battery and computing resources. The scheme is applicable in different scenarios where we have constraints at one end and there is a possibility to have adequate power and computational resources at the other end (like, sink in WSNs).

Read the full paper


Living Internet - Social Objects Powering New Age Cybernetic Networks

(Abstract from TCS Paper presented at IEEE conference in April 2012)
Research Area: Human centric systems - Social Media

Be it a blog post, photo or a YouTube video, social objects are responsible for interactions between people, attributing to notion that ‘People don’t just speak, rather discuss around these objects’. Such crucial objects are often left out of network analysis, concentrating more on node-node connections. Activities around social objects across the web, reflects synaptic changes of the physical world. When modelled in cybernetics fashion, they can generate actionable feedback for system specific goals, which aids in constructing rich interest graphs, multi-faceted reputation system and better functioning groups. This paper proposes a system to read user interactions in digestible feedback form and presents a solution of similar deployment in a network citing the complexities involved in storing smart data. Modelling feedback networks in reverse fashion will give rise to pluggable smart web, where people can make better sense of their actions by peeking into communication flow on the web.

Read the full paper


DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences
Authors: Tarini S Ghosh, Monzoorul Haque M and Sharmila S Mande
(Won the 'Best Paper Award' in The International Conference on Bioinformatics (InCoB) 2010 held in Tokyo, Japan
Journal/Conference: BMC Bioinformatics 2010, 11 (Suppl 7): S14

In metagenomic sequence data, majority of sequences/reads originate from new or partially characterized genomes, the corresponding sequences of which are absent in existing reference databases. Since taxonomic assignment of reads is based on their similarity to sequences from known organisms, the presence of reads originating from new organisms poses a major challenge to taxonomic binning methods. The recently published SOrt-ITEMS algorithm uses an elaborate work-flow to assign reads originating from hitherto unknown genomes with significant accuracy and specificity. Nevertheless, a significant proportion of reads still get misclassified. Besides, the use of an alignment-based orthology step (for improving the specificity of assignments) increases the total binning time of SOrt-ITEMS.

Results: In this paper, we introduce a rapid binning approach called DiScRIBinATE (Distance Score Ratio for Improved Binning And Taxonomic Estimation). DiScRIBinATE replaces the orthology approach of SOrt-ITEMS with a quicker 'alignment-free' approach. We demonstrate that incorporating this approach reduces binning time by half without any loss in the specificity and accuracy of assignments. Besides, a novel reclassification strategy incorporated in DiScRIBinATE results in reducing the overall misclassification rate to around 3 - 7%. This misclassification rate is 1.5 - 3 times lower as compared to that by SOrt-ITEMS, and 3 - 30 times lower as compared to that by MEGAN.

Conclusions: A significant reduction in binning time, coupled with a superior assignment accuracy (as compared to existing binning methods), indicates the immense applicability of the proposed algorithm in rapidly mapping the taxonomic diversity of large metagenomic samples with high accuracy and specificity.

Availability: The program is available on request from the authors.

Read the full paper


A Component Abstraction for Business Processes

Authors: Souvik Barat and Vinay Kulkarni
Conference/Journal: 2nd International Workshop on Reuse in Business Process Management, In conjunction with BPM 2011, Clermont-Ferrand, France, August 2011.
(Selected Best Paper at the Reuse for BPM Workshop at the BPM Conference 2011 in France)

With continued increase in business dynamics, it is becoming increasingly harder to deliver purpose-specific business system in the ever-shrinking window of opportunity. As business systems for the same intent tend to be similar but never the same, they have considerable overlap with well-defined differences. Software product line engineering techniques attempt to address this problem for software artifacts. Separation of business process concerns from application functionality, as advocated in process centric application development, demands solution on similar lines for business processes too. To this effect, we propose an abstraction for business processes that addresses composition, variability and resolution in a unified manner. We present the abstraction, its model-based realization, and illustration with an example.

Read the full paper


Reach Us.