Data that is sensitive
In 2020 alone, more than 155.8 million individuals were adversely affected by data exposures or unintentional leak of sensitive information in the US. Any information that requires to be protected and guarded from unauthorized access is classified as sensitive data. These can include personal information (PII) such as social security number, health data (PHI), financial details like bank account numbers, card holder data (PCI), customer data, trade secrets and patent worthy information.
According to the Federal Information Processing Standards, the sensitivity of data can be measured by its confidentiality or privacy, integrity or accuracy, and availability for use at any point of time. Protection of such sensitive data needs to be considered through its life cycle of discovery, monitoring, masking and de-identification.
The COVID-19 pandemic has triggered some disruptive trends such as borderless workspaces, clinical trials, telemedicine, supply chain globalization, return to work, need for real-time access to data and overall accelerated digital transformation. These are driving the generation of massive structured and unstructured data including logs that require discovery and efficient management to derive useful insights. Other trends such as mergers and acquisitions and product launches also contribute a 16.1% CAGR of the data discovery market, which is likely to grow to $12.4 billion by 2026. Common challenges in this process include disparate sources of data, different kinds of structured and unstructured data, lack of data democratization and organizational silos, quality of data, absence of catalogs, and large amount of relevant data existing outside the organization in the larger ecosystem.
Organizing data using catalog
Data management in the age of big data, data lakes and self-service is challenging. Data catalogs help in organizing the sensitive data from various sources. They provide context to the data with reference to source, structure, quality, lineage and usage by linking sensitive data with their meta data. Cataloging also helps in data classification and understanding the specific fields with sensitive data that need to be masked or encrypted. While a number of paid or open source data catalog tools are available, the lack of expertise in deciding the right one for your business, lack of knowledge of best practices while deploying a data catalog, scalability of the tool, requirement for any additional plug-ins, license terms and lock-in period can become hurdles. Not to mention the need for security measures such as identity access management policies to govern access for sensitive data.
Monitoring of sensitive data is important to understand it and derive insights. Once existing data across various sources is discovered and cataloged, it is essential to monitor sensitive data from new and incoming data to ensure that their integration with the existing data and governance such as masking and encryption are seamless. Exfiltration by stealth attackers also call for close monitoring of sensitive data by enterprises. Expanding borderless work and ecosystem perimeters challenge sensitive data security and call for stringent monitoring.