Supporting FAIR Data Principles with Fedora

Author: David Wilcox

Making data findable, accessible, interoperable, and re-usable is an important but challenging goal. From an infrastructure perspective, repository technologies play a key role in supporting FAIR data principles.

Fedora is a flexible, extensible, open source repository platform for managing, preserving, and providing access to digital content. Fedora is used in a wide variety of institutions including libraries, museums, archives, and government organizations.

Fedora provides native linked data capabilities and a modular architecture based on well-documented APIs and ease of integration with existing applications. As both a project and a community, Fedora has been increasingly focused on research data management, making it well-suited to supporting FAIR data principles as a repository platform.

Fedora provides strong support for persistent identifiers, both by minting HTTP URIs for each resource and by allowing any number of additional identifiers to be associated with resources as RDF properties.

Fedora also supports rich metadata in any schema that can be indexed and disseminated using a variety of protocols and services. As a linked data server, Fedora allows resources to be semantically linked both within the repository and on the broader web.

Along with these and other features supporting research data management, the Fedora community has been actively participating in related initiatives, most notably the Research Data Alliance.

Fedora representatives participate in a number of interest and working groups focused on requirements and interoperability for research data repository platforms.

This participation allows the Fedora project to both influence and be influenced by an international group of Research Data Alliance stakeholders. This paper will describe how Fedora supports FAIR data principles, both in terms of relevant features and community participation in related initiatives.

URL : Supporting FAIR Data Principles with Fedora

DOI : http://doi.org/10.18352/lq.10247

Evaluation of a novel cloud-based software platform for structured experiment design and linked data analytics

Authors : Hannes Juergens, Matthijs Niemeijer, Laura D. Jennings-Antipov, Robert Mans, Jack More, Antonius J. A. van Maris, Jack T. Pronk, Timothy S. Gardner

Open data in science requires precise definition of experimental procedures used in data generation, but traditional practices for sharing protocols and data cannot provide the required data contextualization.

Here, we explore implementation, in an academic research setting, of a novel cloud-based software system designed to address this challenge. The software supports systematic definition of experimental procedures as visual processes, acquisition and analysis of primary data, and linking of data and procedures in machine-computable form.

The software was tested on a set of quantitative microbial-physiology experiments. Though time-intensive, definition of experimental procedures in the software enabled much more precise, unambiguous definitions of experiments than conventional protocols.

Once defined, processes were easily reusable and composable into more complex experimental flows. Automatic coupling of process definitions to experimental data enables immediate identification of correlations between procedural details, intended and unintended experimental perturbations, and experimental outcomes.

Software-based experiment descriptions could ultimately replace terse and ambiguous ‘Materials and Methods’ sections in scientific journals, thus promoting reproducibility and reusability of published studies.

URL : Evaluation of a novel cloud-based software platform for structured experiment design and linked data analytics

DOI : https://doi.org/10.1038/sdata.2018.195

Curating Scientific Information in Knowledge Infrastructures

Authors : Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A. Zaidan, Alex Hardisty

Interpreting observational data is a fundamental task in the sciences, specifically in earth and environmental science where observational data are increasingly acquired, curated, and published systematically by environmental research infrastructures.

Typically subject to substantial processing, observational data are used by research communities, their research groups and individual scientists, who interpret such primary data for their meaning in the context of research investigations.

The result of interpretation is information—meaningful secondary or derived data—about the observed environment. Research infrastructures and research communities are thus essential to evolving uninterpreted observational data to information. In digital form, the classical bearer of information are the commonly known “(elaborated) data products,” for instance maps.

In such form, meaning is generally implicit e.g., in map colour coding, and thus largely inaccessible to machines. The systematic acquisition, curation, possible publishing and further processing of information gained in observational data interpretation—as machine readable data and their machine readable meaning—is not common practice among environmental research infrastructures.

For a use case in aerosol science, we elucidate these problems and present a Jupyter based prototype infrastructure that exploits a machine learning approach to interpretation and could support a research community in interpreting observational data and, more importantly, in curating and further using resulting information about a studied natural phenomenon.

URL : Curating Scientific Information in Knowledge Infrastructures

DOI : http://doi.org/10.5334/dsj-2018-021

Nanopublications: A Growing Resource of Provenance-Centric Scientific Linked Data

Authors : Tobias Kuhn, Albert Meroño-Peñuela, Alexander Malic, Jorrit H. Poelen, Allen H. Hurlbert, Emilio Centeno Ortiz, Laura I. Furlong, Núria Queralt-Rosinach, Christine Chichester, Juan M. Banda, Egon Willighagen, Friederike Ehrhart, Chris Evelo, Tareq B. Malas, Michel Dumontier

Nanopublications are a Linked Data format for scholarly data publishing that has received considerable uptake in the last few years. In contrast to the common Linked Data publishing practice, nanopublications work at the granular level of atomic information snippets and provide a consistent container format to attach provenance and metadata at this atomic level.

While the nanopublications format is domain-independent, the datasets that have become available in this format are mostly from Life Science domains, including data about diseases, genes, proteins, drugs, biological pathways, and biotic interactions.

More than 10 million such nanopublications have been published, which now form a valuable resource for studies on the domain level of the given Life Science domains as well as on the more technical levels of provenance modeling and heterogeneous Linked Data.

We provide here an overview of this combined nanopublication dataset, show the results of some overarching analyses, and describe how it can be accessed and queried.

URL : https://arxiv.org/abs/1809.06532

Réflexions sur le fragment dans les pratiques scientifiques en ligne : entre matérialité documentaire et péricope

Auteurs/Authors : Gérald Kembellec, Thomas Bottini

Cette communication propose une réflexion pluridisciplinaire (SIC, ingénierie documentaire et théorie du document numérique, informatique, « humanités numériques », histoire des pratiques savantes) sur les usages du fragment dans les pratiques documentaires scientifiques en ligne.

En prolongement de ces éléments théoriques sont proposés un modèle théorique de la segmentation des contenus en unités de sens (péricope) et des directions d’implémentation.

URL : https://hal-univ-paris10.archives-ouvertes.fr/hal-01700064

Studying Conceptual Models for Publishing Library Data to the Semantic Web

Author : Sofia Zapounidou

This thesis studies the library data and the way that linked data technologies may affect libraries. The thesis aims to contribute to the research regarding the devel-opment and implementation of a framework for the integration of bibliographic data in the semantic web.

It seeks to make sound propositions for the interopera-bility of conceptual bibliographic models, as well as for future library systems and search environments integrating bibliographic information.

DOI : http://eprints.rclis.org/32108/

Biotea: semantics for Pubmed Central

Authors : Alexander Garcia​, Federico Lopez, Leyla Garcia, Olga Giraldo, Victor Bucheli, Michel Dumontier

A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies.

In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology.

We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language.

We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.

URL : Biotea: semantics for Pubmed Central

DOI : https://doi.org/10.7717/peerj.4201