Towards a Rosetta Stone for (meta)data: Interoperability

cover
8 Mar 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Vogt, Lars, TIB Leibniz Information Centre for Science and Technology;

(2) Konrad, Marcel, TIB Leibniz Information Centre for Science and Technology;

(3) Prinz, Manuel, TIB Leibniz Information Centre for Science and Technology.

Interoperability

Interoperability is directly dependent on machine-actionability: datasets A and B can be said to be interoperable if there is an operation X that can be applied equally to both. Because of this dependency, interoperability inherits from machine-actionability that it is not a Boolean property, but rather describes a continuum, and its degree depends on the number of operations that can be applied to a given type of (meta)data. Interoperability also involves the ability to identify the type of (meta)data within a given dataset that can be processed by a given operation, and vice versa.

(Meta)data are composed of terms (here, meant in a broad sense, including also symbols and values) that form statements. Both, terms and statements, carry meaning, and thus semantic content, and both are required for the successful communication of information. The interoperability of terms and statements between a sender and a receiver of (meta)data is therefore a prerequisite for their successful communication. Successfully communicating (meta)data between machines and between a machine and a human being requires not only their successful transmission, so that the receiver can read them (i.e., readability), but also their successful processing, so that the receiver understands their meaning (i.e., interpretability) and can use them in another context by applying specific operations to them (i.e., actionability)

Obviously, interoperability plays a very important role in this communication process and is also central to the realization of FAIR (meta)data. Without interoperability, the findability and reusability of (meta)data is limited, and without interoperability there is no machine-actionability. This central role of interoperability has also been recognized by the EOSC. In their EOSC Interoperability Framework (20), they distinguish four layers of interoperability:

  • technical interoperability (i.e., information technology systems must work with other information technology systems in implementation or access without any restrictions or with controlled access),
  • semantic interoperability (i.e., contextual semantics related to common semantic resources),
  • organizational interoperability (i.e., contextual processes related to common process resources), and
  • legal interoperability (i.e., contextual licenses related to common license resources).

However, interoperability is not only required for the successful communication of information between machines. It is also required for the communication of information between humans and machines, and thus involves an additional layer of interoperability that takes into account the need for communication to follow and apply rules, and that requires knowledge to be shared between humans and machines, involving the correct use of devices and tools that humans need to learn and understand. We believe that another layer of interoperability, i.e., cognitive interoperability, needs to be added to the EOSC Interoperability Framework. We understand cognitive interoperability as characterized in Box 3.

Cognitive interoperability focuses on the usability of data structures and knowledge management systems for human users and developers—an aspect that has been somewhat overlooked, especially in the context of knowledge graphs and semantic technologies. Cognitive interoperability also means taking into account how humans typically communicate. As experts in communicating information efficiently, omitting background knowledge and using somewhat fuzzy statements that refer to general figures of thought and that use metaphors and metonymies, we humans are usually very efficient in our communication, reducing the information to be communicated to a minimum, knowing that other human beings will still be able to understand and infer the missing information from the context. For a machine, on the other hand, all relevant information must be explicitly stated. As a consequence, cognitive interoperability has to deal with the dilemma that arises from the conflict between machine-actionability and human-actionability of (meta)data representations: the more data representations are pushed towards machine-actionability, the more complex they become and thus the less human-actionable (10) (see Fig. 1, middle). This is an impedance mismatch. It has the potential to frustrate humans communicating (meta)data with machines.

Figure 1: Comparison of a human-readable statement with its machine-actionable representation and its human-actionable representation. Top: A human-readable statement about the observation that a particular apple weighs 212.45 grams, with a 95% confidence interval of 212.44 to 212.47 grams. Middle: A machine-actionable representation of the same statement as an ABox semantic graph, using RDF syntax and following the general pattern for measurement data from the Ontology for Biomedical Investigations (OBI) (25) of the Open Biological and Biomedical Ontology (OBO) Foundry. Bottom: A human-actionable representation of the same statement as a mind-map like graph, reducing the complexity of the RDF graph to the information that is actually relevant to a human reader. [Figure taken from (10)]

If we want to store (meta)data in a knowledge graph in a machine-actionable format and at the same time present them in an easily understandable, human-readable way in the user interface (UI), we need to decouple the data storage in the graph from the data presentation in the UI, so that information that is only necessary for machines but irrelevant for humans is only accessed by machines but not displayed in the UI (see Fig. 1, bottom). However, considering the complexity of the tasks that users of a knowledge graph want to accomplish (e.g., fact-finding, understanding cause-effect chains, or understanding controversial topics) (26), it becomes clear that cognitive interoperability of (meta)data involves more than this. The cognitive interoperability of a graph can also be enhanced by developing new approaches and tools for exploring and navigating the graph, zooming in and out at different levels of representational granularity, thereby reducing the complexity of the graph to only those bits of information that are currently relevant to the user. This adds another layer to the requirements placed on the UIs of FAIR knowledge graphs to support different user tasks, following visual information seeking mantras such as ‘overview first, zoom and filter, then details-on-demand’ (27), ‘search first, show context, and expand on demand’ (28), and ‘details first, show context, and overview last’ (29) or ‘overview for navigation’ (10).

Tools for describing graph patterns that enforce a standardized way of modeling and representing data of the same type, such as the Shapes Constraint Language SHACL and Data Shapes DASH (30), Shape Expressions ShEx (31,32), or the Reasonable Ontology Templates OTTR (33,34), provide some support for decoupling data storage from data presentation. However, they do not support developers and data stewards in using (meta)data or writing queries against them, nor do they provide the structure to the graph required to support semantically meaningful navigation and exploration of the graph. Unfortunately, the FAIR Guiding Principles do not address the need to decouple data storage from data presentation, nor do they address the need to explore (meta)data in a human-actionable way. Therefore, we proposed to extend the FAIR Guiding Principles to include the principle of human explorability, resulting in the FAIREr Guiding Principles (10), which take into account the cognitive interoperability of (meta)data (for a detailed discussion see (10)).