Why semantic data?
All enterprises need insight into their ecosystems. Without insights, it's impossible to make data-driven decisions or to survive in a data-driven world.
Companies need to integrate diverse data to understand changes unfolding in their business and the world around them and to adapt to those changes. But often, they run into challenges such as:
Continuously increasing sophistication in analytics requirements
Data silos and multiple data sources, which lead to duplication and inconsistency, reduce the value of analytics, and add risk and time to decision making
Long data supply chains and complex data lineage, which obscure the causes of issues and slow down innovation
Managing data as a graph addresses these issues at their roots. In a relational database, you might be able to use 10 entities in 9 combinations. In a graph the potential maximum is 10-factorial combinations. Even if only half are realized, that’s 120 combinations ready for analysis. On top of that, entities (nodes) and relationships are added or changed in the database by simply adding or changing data.
Volume, velocity, and variety in data are not the only challenges; the new goal is to overcome uncertainty and deal with what we do not yet know.
Pharmaceutical, financial, and technical industries provide examples of semantic data helping organizations make data-driven decisions.
Pharmaceutical companies have a patchwork of large volumes of documents, experimental datasets, various data models, and relational databases. A main effort in most pharma use cases is to join them up so that omissions or inconsistencies and new correlations can be found—that is, to reveal what they ought to know already. But even if the data could be converted to relational data, the solution would still be insufficiently flexible. New, semantic methods can tease out the clues to the causes and cures of disease, ill-health, and even aging.
In the financial sector, access to an organization’s total knowledge is vital to identify and prevent money laundering, fraud, and terror financing.
In engineering, transport and utilities, digital twins must model the complexity of dependencies in interconnected systems to identify, simulate, and anticipate product outcomes and their environmental impacts.
The semantic edge
Semantic data provides more flexibility to queries, with three distinct advantages.
Apart from its computational efficiency, where we started, semantic data has a fundamental difference from tabular data. With tabular data, any assertion that is not in the database is assumed to be false. In a graph, however, that same assertion is undecidable: we just don’t know. The open and potentially unending network of the graph is ideal not just for capturing incomplete data, but also for adding to and extending our observations. In business terms, this means greater agility and faster digital innovation.
A third advantage of semantic data is inference. “Triples” (a data set that has a subject, predicate, and object expression) state (partial) truths. Logic can then be applied systematically to identify additional meaning or find anomalous evidence (inferences). In practice, this means finding suspects in fraud, drug discovery, and system failure from data whose volume, variety and velocity far exceeds the capacity of human investigators.
For example, pharma companies are automatically annotating huge volumes of data with terms and relationships drawn from ontologies (concepts and categories). This metadata then can be drawn into a knowledge graph that starts from the same ontologies. The knowledge graph is then interrogated using data science, pattern recognition, and machine learning, to find the vital missing research – a potential discovery that cannot be seen from a human standpoint.
Any organization that’s not identifying use cases or exploring semantic data is behind the curve, at a point of major inflection in the evolution of data management.
Here are use cases spanning various industries.
Identifying and preventing fraud based on detecting communities of like behavior, unusual transactions, or suspicious commonalities.
Improving customer experience and patient outcomes by surfacing complex sequences of activities.
Anticipating customer demand for energy by conducting multi-factor analysis of weather and behavioral influencers (sporting events, holidays, TV scheduling, cultural events, or changes in working patterns).
Retaining customers by combining individual data with community behavior and the activities of influential individuals.
Forecasting supply-chain needs using a holistic view of dependencies and identified bottlenecks.
Recommending products based on customer history, seasonality, warehouse stock, and sales of other items.
Conducting what-if scenario planning and “next best action” recommendations using alternative pathways and similarities.
Many vendors position knowledge graphs as a means of integrating diverse data from multiple sources but creating a semantic data fabric is so much more than implementing technology.
The graph itself does not address the challenge of aligning diverse definitions of the data.
The process of making data available or visible on a semantic platform does, however, have the advantage of automated reasoning. When a target data model or the lineage, residency, and survivorship of the data are captured as an ontology and graphs, automated reasoning can assist in identifying inconsistencies, no matter how large or complicated the model or data flows. Starting with the most relevant and valuable data establishes a baseline, which may be built out progressively. This benefits from the governance and controls provided uniquely by an ontology-enabled modeling process.
Not all semantic data is voluminous or fast-moving. For example, a relatively small number of triples can provide the semantics in a large semantic data set. These may be organized as a tightly focused set of triples or as an ontology. It is the meaning and logic that ontologies inject into data that makes them a vital part of semantic data management.
The way forward
Semantic data will cannibalize some existing IT investments.
Sometimes this has to be borne, to grasp the opportunity of establishing new capability or even to maintain existing business. At this point, senior execs should be asking their teams for use cases to bring together a well-funded business case for developing organizational capability in semantic data, not just point solutions. They should stimulate and reward those who are willing to step out of the comfort zone and into the world of triples, SPARQL, and graph.
Boston Scientific has an integrated supply chain, from raw materials to complex devices, that includes development, design, manufacturing, and sales. Predicting and preventing device failures early is crucial. However, the company had difficulty pinpointing the root cause of defects, limiting its ability to prevent future problems.
Boston Scientific used a decisioning graph to analyze its supply chain, covering parts, finished products, and failures, organized by an ontology, which defined a hierarchy of parts and relationships to events that result in failures. Graph queries quickly revealed subcomponents’ complex relationships and traced their failures. The company identified previously unknown vulnerabilities, using graph-based algorithms to rank parts’ proximity to failures and to similar components.
Figure 2: DB-Engines: DBMS category rankings – with graph DB and RDF (triple stores) shown separately (https://db-engines.com/en/ranking_categories)