This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Vogt, Lars, TIB Leibniz Information Centre for Science and Technology;
(2) Konrad, Marcel, TIB Leibniz Information Centre for Science and Technology;
(3) Prinz, Manuel, TIB Leibniz Information Centre for Science and Technology.
Table of Links
- Abstract & Introduction
- Interoperability
- Semantic interoperability and what natural languages like English can teach us
- Requirements for successfully communicating terms and statements
- Parallels between the structure of natural language statements and data schemata with implications for semantic interoperability
- What makes a term a good term and a schema a good schema?
- The need for a machine-actionable Rosetta Stone for (meta)data that acts as an interlingua for specifying reference terms and reference schemata to support cognitive and semantic interoperability
- Rosetta Stone and machine-readability: UPRIs, XML Schema datatypes, and RDF for communicating terms, datatypes, and statements
- Rosetta Stone and machine-interpretability: Wikidata and a modeling paradigm for (meta)data statements based on English
- Rosetta Stone and semantic interoperability: Specifying term mappings and schema crosswalks
- Rosetta Stone and cognitive interoperability: Specifying display templates and using a query builder
- Discussion
- Related work
- Conclusion, Acknowledgements, & References
Related work
SHACL and ShEx are shape constraint languages for describing RDF graph structures (i.e., shapes) that identify predicates and their associated cardinalities and datatypes. Shapes can be used for communicating data structures, creating, integrating, or validating graphs, and generating UI forms and code.
Reasonable Ontology Templates (OTTR) takes up the idea of shapes and uses them as building blocks for knowledge bases. The templates provide an abstraction level that is better suited for managing a knowledge base than the low-level RDF triples or OWL axioms. The templates are stored in a template library which supports reuse and therewith uniform modeling across different knowledge bases. Templates can refer to other templates. With its clear separation of templates and their instantiations in the form of data, OTTR clearly separates between knowledge base design and knowledge base contents. This is comparable to the Rosetta Framework. Tools exist for mapping CSV or relational data to templates.
The Research Data Alliance (RDA) is developing an InteroperAble Descriptions of Observable Property Terminology (I_ADOPT (70)), i.e., an Interoperability Framework for representing observable properties that shares some similarities with the Rosetta Framework. The I_ADOPT framework is based on an ontology designed to enhance interoperability between existing models (e.g., ontologies, taxonomies, controlled vocabularies) of semantically describing variables. Variables are understood to be descriptions of something observed or mathematically derived, defined by at least the entity being observed and by corresponding characteristics, but more complex variables can be described as well and require additional entities. One of the difficulties encountered in representing the meaning of such variables is agreeing upon the elements that constitute them. The ontology provides essential atomic components and relations that can be employed to define machine-interpretable FAIR variable descriptions. The I_ADOPT framework does not cover concepts such as units, instruments, methods, and geographic location information, but is confined on the description of the variable itself. It provides templates, i.e., Variable Design Patterns (VDPs), that are similar to Ontology Design Patterns and provide schemata for specific types of variables.
While there are many similarities between the I_ADOPT and the Rosetta Framework, only the Rosetta Framework models statements instead of a human-independent reality and therefore provides a very generic modeling approach that reflects the structure of natural language statements, thus ensuring the cognitive interoperability of the schemata (=models)—they are easy to apply and easy to understand (e.g., compared to the step-by-step procedure for minting new variable). In addition, whereas I_ADOPT has a strong focus in environmental research and in encoding measurement and observational (meta)data, the Rosetta Framework is domain-agnostic and can be applied to any type of statement.
We also follow with interest the ongoing Abstract Wikipedia project and Wikifunctions, which is closely related to it, especially the part relating to Constructor Units and the abstract content language. Constructor Units provide abstract representations of predicate statements and thus follow an idea that shares some similarities with our Rosetta modeling paradigm, with the differentiation of required and optional object-positions. Interesting to us is also the idea to verbalize a constructor unit in more than one sentence for improving its readability and to have several possible sentence-like realizations of it, e.g., in different languages, all being provided in rendering time. Wikifunctions, in turn, is similar to but in dimensions more general than our idea of an open repository for Rosetta Functions.