This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Vogt, Lars, TIB Leibniz Information Centre for Science and Technology;
(2) Konrad, Marcel, TIB Leibniz Information Centre for Science and Technology;
(3) Prinz, Manuel, TIB Leibniz Information Centre for Science and Technology.
Table of Links
- Abstract & Introduction
- Interoperability
- Semantic interoperability and what natural languages like English can teach us
- Requirements for successfully communicating terms and statements
- Parallels between the structure of natural language statements and data schemata with implications for semantic interoperability
- What makes a term a good term and a schema a good schema?
- The need for a machine-actionable Rosetta Stone for (meta)data that acts as an interlingua for specifying reference terms and reference schemata to support cognitive and semantic interoperability
- Rosetta Stone and machine-readability: UPRIs, XML Schema datatypes, and RDF for communicating terms, datatypes, and statements
- Rosetta Stone and machine-interpretability: Wikidata and a modeling paradigm for (meta)data statements based on English
- Rosetta Stone and semantic interoperability: Specifying term mappings and schema crosswalks
- Rosetta Stone and cognitive interoperability: Specifying display templates and using a query builder
- Discussion
- Related work
- Conclusion, Acknowledgements, & References
What makes a term a good term and a schema a good schema?
First and foremost, a good schema for a (meta)data statement must cover all the information that needs to be documented, stored, and represented for the corresponding type of statement. However, beyond that, there are many other criteria for evaluating schemata. Most of these relate to the different operations one wants to perform on the (meta)data, and the formats required by the corresponding tools, which determine the degree of machine-actionability of the (meta)data. These include search operations (i.e., the findability in FAIR), but also reasoning and all kinds of data transformations, such as unit conversion for measurement data. Communicating with humans is another set of operations that needs to be considered when evaluating (meta)data schemata, as it relates to cognitive interoperability and thus the human-actionability of (meta)data.
Unfortunately, different operations are likely to have different requirements on a schema, and the tools that execute these operations may have their own requirements. For example, optimizing the findability of measurement data requires a different data schema than optimizing reasoning over them or their reusability. A given schema therefore needs to be evaluated in terms of the operations to be performed and the tools to be used on the (meta)data, often involving trade-offs between different operations that are prioritized differently in order to achieve an overall optimum. An example is the trade-off between reasoning and human-readability, as discussed above in the context of the dilemma between machine-actionable and human-actionable (meta)data schemata (Fig. 1).
As a consequence, for a given type of statement, there is likely to be a need for more than one corresponding schema. This is mainly because, besides historical reasons, different research communities often have different frames of reference and thus emphasize different aspects of a given type of entity, resulting in the need for different terms for the same type of entity (resulting in issues with ontological and thus terminological interoperability, but not necessarily with referential interoperability), but also because different research communities want to perform different operations on the (meta)data, different types of schemata. Since operations on (meta)data can be performed with different sets of tools, not only the structure of the schema is important, but also the format in which it can be communicated with such tools. For example, some tools require (meta)data to be in RDF/OWL, others in JSON, as CSV, or as a Python or Java data class.
Obviously, FAIRness is not sufficient as an indicator of high quality (meta)data―the use of (meta)data often depends on their fitness-for-use, i.e., their availability in appropriate formats that conform to established standards and protocols that allow their direct use, e.g., when a specific analysis software requires data in a specific format.
Therefore, although agreement on a common vocabulary and a common set of schemata would be a solution for semantic interoperability and machine-actionability of (meta)data across different research domains, this is unlikely to happen, and we have to think pragmatically and emphasize the need for ontological and referential term mappings and schema crosswalks for terminological schematic interoperability.