Towards a Rosetta Stone for (meta)data: Discussion

cover
8 Mar 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Vogt, Lars, TIB Leibniz Information Centre for Science and Technology;

(2) Konrad, Marcel, TIB Leibniz Information Centre for Science and Technology;

(3) Prinz, Manuel, TIB Leibniz Information Centre for Science and Technology.

Discussion

Thinking about possible criticisms of the Rosetta Framework, one could mention its underlying idea to consider statements as minimum information units in addition to individual terms and to organize the knowledge graph accordingly. To structure a knowledge graph into statements, each statement can be organized in a nanopublication (66–68) and as a FAIR Digital Object (see also discussion in the context of semantic units (50)). As a consequence, however, the Rosetta Framework requires the specification of a statement class with associated reference schema for each type of statement needed in a knowledge graph. This limitation results from the fact that in the Rosetta Framework, the set of statement classes and associated reference schemata defines the possible proposition-space of the knowledge graph. While this criticism is valid, we want to respond that truly FAIR (meta)data require that every (meta)data statement in a knowledge graph must be FAIR. For a statement to be FAIR, it must be interoperable, and we explained above why this requires the specification of a schema for each statement type. Now, to be truly FAIR, each statement must also reference the schema against which it was modeled for reasons of schematic interoperability, and ideally also which statement type it instantiates. Thus, the need to model each statement type is not unique to the Rosetta Framework, but applies to FAIR knowledge graphs in general.

The difference with other knowledge graph frameworks is that the Rosetta Editor will provide substantial support for creating schemata for new statement types and will not require experience with Semantics, RDF, OWL, or any graph query language, improving cognitive interoperability over other frameworks. In addition, once a reference schema and its corresponding statement class are specified and made available in a schema repository, they can be reused by any other party, reducing the effort required to develop new knowledge graph applications when using the Rosetta Framework. Knowledge graphs that use a dynamic (crowdsourced) approach to knowledge graph construction, and thus do not follow the otherwise typical static information extraction paradigm with a closed set of predefined schemata, particularly benefit from the Rosetta Framework and its Editor, as applying the static approach to dynamic scenarios or domains usually falls short when a new type of statement needs be added to the graph.

Another criticism is that the Rosetta modeling paradigm does not use a logical framework, so you cannot apply reasoning to statements created with it. We agree that it would be desirable to be able to apply reasoning, but we consider it as more important to ensure first the findability and then the general FAIRness and cognitive interoperability of (meta)data. We have therefore chosen the Rosetta modeling paradigm to model statements rather than reality. If reasoning is required, you can always convert (meta)data into a structure that enables reasoning using one of the specified schema crosswalks.

We believe that the Rosetta Framework will help improve the technical and semantic interoperability of (meta)data. Institutions running their knowledge graph applications based on the Rosetta Framework will be able to share their (meta)data and even build a federated virtual knowledge graph with their partners, while data stewardship remains in their own hands, thus ensuring full control over the ethical, privacy, or legal aspects of their data (following Barend Mons’ data visiting as opposed to data sharing (19)). In addition, by providing the possibility to specify additional access models and to access the (meta)data of the knowledge graph in different formats (including Java and Python data classes), and by making statement classes together with their associated reference schemata and display templates, functions, the Editor, and the Query-Builder openly and freely available, the Rosetta Framework fulfills the recommendations of the EOSC Interoperability Framework for technical interoperability (20).

On the other hand, by using Wikidata terms and other controlled vocabularies and ontologies that provide UPRIs and publicly available definitions for their class terms, by tracking and documenting metadata and by providing a way to specify schema crosswalks, by decoupling human-readable data display from machine-actionable data storage, with the former focusing on providing human-readable data views and the latter involving the consistent application of reference schemata that ensure the semantic interoperability of (meta)data, by providing the ability to apply community-specific standards for accessing (meta)data, and by documenting for each statement through its statement instance resource which reference schema had been used to model it, the Rosetta Framework also satisfies the recommendations for semantic interoperability (20), while increasing the overall cognitive interoperability of its (meta)data and its knowledge graph applications.

The Rosetta Framework further supports cognitive interoperability, by modeling natural language statements rather than reality and by providing easy-to-use tools that remove some layers of complexity and requirements such as having to be proficient in (graph) query languages, since the Rosetta Query-Builder will derive generic CRUD queries based on reference schemata—no need for developers and operators to define schemata and write queries themselves. Moreover, all reference schemata with their crosswalks and all functions, once specified and developed and made available in a repository, can be reused by anyone developing their own knowledge graph applications.