For research organizations, collection and interpretation of data from analytical instrumentation is a critical part of scientific endeavors. In a mid-sized lab, there can be over 1,000 instruments across 30 to 40 labs. Whereas in a large pharma organization, there may be thousands of instruments per site and numerous global sites. Until the introduction of informatics platforms and paperless lab concepts, analytical data and observations were collected manually. Paperless labs transformed the storage of lab data from cabinets full of paper to gigabytes and terabytes of data generated every year. As we look to develop the lab of the future, new challenges have emerged.
Wherever present, instrument interfaces are highly diverse. Additionally, there is no universally accepted data or file format for establishing connections between the instrument and platforms such as electronic lab notebooks (ELN) or laboratory information management system (LIMS). As a result, instruments are not easily interchangeable between these connections and any configuration change such as instrument replacement or software upgrade often requires additional efforts to re-establish the connection and re-validate the instrument.
Ideal data standardization should aim to adhere to the FAIR Principles for establishing and sharing data management best practices. Such a framework has the added benefit of ‘future-proofing’ organizational data by which future technologies would also comply, allowing them to utilize legacy data. The FAIR principles include:
Machine-readable metadata is essential for automatic discovery of datasets and services. Hence, the data should be easy to find by both humans and computers and in recognized and accepted data formats and standardized nomenclatures.
Once a user finds the required data, they need to know how to access it after adequate authentication and authorization.
If data is to be integrated in a larger framework, it needs to work in conjunction with other applications or workflows for analysis, storage, and processing.
To optimize the reuse of data in current and future informatics platforms, a framework can be developed where data and metadata are standardized and portable if the organization switches vendors.
Enabling labs of the future
To implement a framework that enforces FAIR principles, organizations need to adopt a comprehensive approach to instrument data management. Here is a depiction of what such an approach could look like:
Standardizing and unifying data
Instrument data needs to be transformed at an intermediary interface before it is pushed to a data lake. By linking multiple physical connections from instruments, this interface can feed them to a single integration point. At this stage, standardized taxonomies, nomenclatures, and guidelines can be applied on the resultant data while aligning it with organizational policies. Following which, the data can be reviewed and pushed to a data lake. This approach would unify the data and metadata and enforce a uniform standard in dataflow from all sources to a single repository.
Streamlining data flow
Within this framework, each data source retains its own format and the connection is managed independently while nomenclature, guidelines, and policies are enforced at the destination. Once the data is validated, it can be pushed to a data lake for consumption by downstream informatics and analytics platforms. As data standardization evolves, data definitions and taxonomies will inevitably change. This may be changed at a single point, rather than individually, with brittle integrations between each data source and destination.
Enforcing data integrity
Blockchain can be used to adhere to a protocol for inter-node communication and validating new data. Once recorded and approved at the data source, the information cannot be modified retroactively without alteration at the bus and data lake. This distributed ledger model would allow secure collaboration within and outside the organization. Being highly scalable, blockchain allows for near-limitless integrations with new instruments or technologies, facilitating global collaboration across organizational sites.
Simplifying data analysis
With the data aggregated and standardized, analysis packages will no longer require manual entry or data import from other systems. Using automated database queries to access data sets will greatly accelerate the consumption and interpretation of data. AI or machine learning may be leveraged in this ecosystem to recognize trends automatically and send alerts accordingly. By defining hypotheses and identifying subjects, cohorts, treatments, and time intervals based on unified data, there will be a dramatic increase in the power of an organization’s knowledge base.
One of the evident benefits of adopting a unified data framework is the improved ability to share data, from any source, across an entire organization. This greatly improves the efficiency with which raw data can be analyzed, interpreted, submitted, and turned into institutional knowledge. In the longer run, unified data allows a business the freedom to adopt new technologies and data platforms as and when needed and the ability to take their data with them.