This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Vogt, Lars, TIB Leibniz Information Centre for Science and Technology;
(2) Konrad, Marcel, TIB Leibniz Information Centre for Science and Technology;
(3) Prinz, Manuel, TIB Leibniz Information Centre for Science and Technology.
Table of Links
- Abstract & Introduction
- Interoperability
- Semantic interoperability and what natural languages like English can teach us
- Requirements for successfully communicating terms and statements
- Parallels between the structure of natural language statements and data schemata with implications for semantic interoperability
- What makes a term a good term and a schema a good schema?
- The need for a machine-actionable Rosetta Stone for (meta)data that acts as an interlingua for specifying reference terms and reference schemata to support cognitive and semantic interoperability
- Rosetta Stone and machine-readability: UPRIs, XML Schema datatypes, and RDF for communicating terms, datatypes, and statements
- Rosetta Stone and machine-interpretability: Wikidata and a modeling paradigm for (meta)data statements based on English
- Rosetta Stone and semantic interoperability: Specifying term mappings and schema crosswalks
- Rosetta Stone and cognitive interoperability: Specifying display templates and using a query builder
- Discussion
- Related work
- Conclusion, Acknowledgements, & References
Rosetta Stone and semantic interoperability: Specifying term mappings and schema crosswalks
Different controlled vocabularies and ontologies may contain terms that have the same meaning and referent, and are therefore strict synonyms. Unfortunately, if their UPRIs differ, a machine will not be able to recognize them as synonyms, and statements using such terms will not be interoperable because the terms they use are not terminologically interoperable (10). The problem of the lack of terminological interoperability can be solved by specifying ontological term mappings for all strictly synonymous terms across different vocabularies and ontologies, as discussed above. Unfortunately, however, terms that appear to be strictly synonymous at first glance sometimes turn out not to be strictly synonymous on closer inspection, or the evaluation of their synonymy involves evaluating chains of interdependencies between numerous terms across different ontologies and is practically not feasible. In such cases, but also when terms refer to the same entities but differ in their ontological definitions, referential term mappings can be specified.
In the first implementation of the Rosetta Framework, we suggest being pragmatic and focusing on referential interoperability of terms and thus referential term mappings—ontological term mappings can still be added later. We plan to use Wikidata terms as referent terms for referential term mappings. The mappings and should be stored and made openly accessible and usable in a terminology service such as the
Analogous to referential term mappings, referential schema crosswalks will be specified to establish schematic interoperability between all schemata that model the same type of statement (10). With schema crosswalks specified via the appropriate Rosetta Framework reference schema (cf. Fig. 4), researchers can use a schema that is optimized for the set of operations and tools relevant to their particular project and research topic, while making the (meta)data they create referentially and schematically interoperable with all other statements created with schemata for which schema crosswalks have been specified for the same reference schema (Fig. 8).
All provenance and metadata statements associated with a particular dataset or a given data statement can be mapped to any provenance and metadata schema or model, such as PROV-O, PAV ontology (62), Dublin Core Metadata Initiative, or DataCite Schema, if a corresponding reference schema is specified along with the required metadata crosswalks.
In addition to specifying schema crosswalks to different (meta)data graph schemata, they can also be specified to different formats such as RDF/OWL, GraphQL, Python or Java data classes, JSON, and CSV. Since all of these formats must provide data slots for a given statement type that map to its positions and their associated semantic roles, mapping to non-graph-based formats should be analogous to mapping to graph-based formats (e.g., Fig. 2D).
The Rosetta Editor must allow the specification of new graph-based and tabular (meta)data schemata and formats whenever needed, thus allowing adaption to newly emerging (meta)data formats and standards 5 . This takes into account the observation that FAIRness is not sufficient as an indicator of high (meta)data quality―the use of (meta)data often depends on its fitness-for-use, i.e., data must be available in appropriate formats that comply with established standards and protocols that allow their direct use, e.g., when a specific analysis software requires data in a specific format.
With the Rosetta Framework and its Editor, any domain expert can begin to specify their own small Rosetta module for each type of statement that is relevant to them. The module comprises the statement class, a reference schema, and schema crosswalks. Rosetta modules facilitate the production of FAIR machine-actionable (meta)data statements. Using the Editor, any number of additional schemata can be specified at any time and linked to the reference schema through a schema crosswalk. These additional schemata can be used by applications to make their (meta)data, which are stored according to reference schemata, accessible in any other schema and format—either virtually through their UI or as an export option. As a result, (meta)data managed by an application that applies the Rosetta Framework are decoupled from the application’s storage model and can be readily used in other frameworks
And since schema crosswalks allow (meta)data conversion in both directions, they can also be used to import (meta)data from any schema into the reference schema. Combined with the concept of semantic units (50), the Rosetta modules resemble Knowledge Graph Building Blocks, i.e., small information modules for knowledge processing (for a discussion of Knowledge Graph Building Blocks, see (64)).
The ability to specify schema crosswalks that convert, for example, weight measurement statements that comply with a corresponding reference schema into data graphs that comply with the corresponding OBI schema also opens up the possibility for knowledge graph applications to establish workflows in which statements that meet certain criteria, such as having a certain confidence level or having a documented reference to a relevant source of evidence for the statement, are then converted into data graphs that comply with the OBI schema for weight measurements, thereby converting information from a schema that models statements into a schema that models a human-independent reality.
From reference schemata to OWL-based schemata that allow reasoning
(Meta)data statements and their corresponding statement classes can be understood as a formalized approach to modeling n-ary predicates. Each statement class refers to an n-ary predicate, and the statement class can be understood as an attempt to model this predicate as an ontology class instead of an ontology property. As a consequence, reasoning over property axioms such as transitivity or domain and range specifications is not straightforward with (meta)data statements resulting from instantiations of reference schemata, and tools established for OWL-based frameworks cannot be readily reused in the Rosetta Framework. Therefore, it is important for the Framework to provide interoperability with OWL and description logics by providing, by default, a corresponding OWL-based schema for each reference schema and its corresponding statement class. When a new reference schema is specified for a new type of statement, the Rosetta Editor uses the information provided for the newly specified statement class and for the reference schema to automatically specify an OWL-based schema and the associated schema crosswalk.
When we take a material has-part statement with a material object as the subject and another material object as the object argument, the corresponding reference schema takes on the structure shown in Figure 9, top. We can define a new object property ‘has material part’ as a subproperty of ‘required object position’, with the domain and range specified as ‘MATERIAL OBJECT’ (Fig. 9, bottom). An annotation property indicates that the ‘has material part’ property belongs to the ‘material has-part statement’ class. The domain and range specifications are taken from the constraints of the two slots that the property connects, and any logical property axioms, such as transitivity, are taken from the corresponding specifications on the range object-position class (see discussion above).