Creating Structured Linked Data to Generate Scholarly Profiles: A Pilot Project using Wikidata and Scholia

Authors : Mairelys Lemus-Rojas, Jere D. Odell

INTRODUCTION

Wikidata, a knowledge base for structured linked data, provides an open platform for curating scholarly communication data. Because all elements in a Wikidata entry are linked to defining elements and metadata, other web systems can harvest and display the data in meaningful ways.

Thus, Wikidata has the capacity to serve as the data source for faculty profiles. Scholia is an example of how third-party tools can leverage the power of Wikidata to provisde faculty profiles and bibliographic, data-driven visualizations.

DESCRIPTION OF PROGRAM

In this article, we share our methods for contributing to Wikidata and displaying the data with Scholia.

We deployed these methods as part of a pilot project in which we contributed data about a small but unique school on the Indiana University-Purdue University Indianapolis (IUPUI) campus, the IU Lilly Family School of Philanthropy.

NEXT STEPS

Following the completion of our pilot project, we aim to find additional methods for contributing large data collections to Wikidata. Specifically, we seek to contribute scholarly communication data that the library already maintains in other systems.

We are also facilitating Wikidata edit-a-thons to increase the library’s familiarity with the knowledge base and our capacity to contribute to the site.

URL : Creating Structured Linked Data to Generate Scholarly Profiles: A Pilot Project using Wikidata and Scholia

DOI : https://doi.org/10.7710/2162-3309.2272

Automatically Annotating Articles Towards Opening and Reusing Transparent Peer Reviews

Authors : Afshin Sadeghi, Sarven Capadisli, Johannes Wilm, Christoph Lange, Philipp Mayr

An increasing number of scientific publications are created in open and transparent peer review models: a submission is published first, and then reviewers are invited, or a submission is reviewed in a closed environment but then these reviews are published with the final article, or combinations of these.

Reasons for open peer review include giving better credit to reviewers and enabling readers to better appraise the quality of a publication. In most cases, the full, unstructured text of an open review is published next to the full, unstructured text of the article reviewed.

This approach prevents human readers from getting a quick impression of the quality of parts of an article, and it does not easily support secondary exploitation, e.g., for scientometrics on reviews.

While document formats have been proposed for publishing structured articles including reviews, integrated tool support for entire open peer review workflows resulting in such documents is still scarce.

We present AR-Annotator, the Automatic Article and Review Annotator which employs a semantic information model of an article and its reviews, using semantic markup and unique identifiers for all entities of interest.

The fine-grained article structure is not only exposed to authors and reviewers but also preserved in the published version. We publish articles and their reviews in a Linked Data representation and thus maximize their reusability by third-party applications.

We demonstrate this reusability by running quality-related queries against the structured representation of articles and their reviews.

URL : https://arxiv.org/abs/1812.01027

Supporting FAIR Data Principles with Fedora

Author: David Wilcox

Making data findable, accessible, interoperable, and re-usable is an important but challenging goal. From an infrastructure perspective, repository technologies play a key role in supporting FAIR data principles.

Fedora is a flexible, extensible, open source repository platform for managing, preserving, and providing access to digital content. Fedora is used in a wide variety of institutions including libraries, museums, archives, and government organizations.

Fedora provides native linked data capabilities and a modular architecture based on well-documented APIs and ease of integration with existing applications. As both a project and a community, Fedora has been increasingly focused on research data management, making it well-suited to supporting FAIR data principles as a repository platform.

Fedora provides strong support for persistent identifiers, both by minting HTTP URIs for each resource and by allowing any number of additional identifiers to be associated with resources as RDF properties.

Fedora also supports rich metadata in any schema that can be indexed and disseminated using a variety of protocols and services. As a linked data server, Fedora allows resources to be semantically linked both within the repository and on the broader web.

Along with these and other features supporting research data management, the Fedora community has been actively participating in related initiatives, most notably the Research Data Alliance.

Fedora representatives participate in a number of interest and working groups focused on requirements and interoperability for research data repository platforms.

This participation allows the Fedora project to both influence and be influenced by an international group of Research Data Alliance stakeholders. This paper will describe how Fedora supports FAIR data principles, both in terms of relevant features and community participation in related initiatives.

URL : Supporting FAIR Data Principles with Fedora

DOI : http://doi.org/10.18352/lq.10247

Evaluation of a novel cloud-based software platform for structured experiment design and linked data analytics

Authors : Hannes Juergens, Matthijs Niemeijer, Laura D. Jennings-Antipov, Robert Mans, Jack More, Antonius J. A. van Maris, Jack T. Pronk, Timothy S. Gardner

Open data in science requires precise definition of experimental procedures used in data generation, but traditional practices for sharing protocols and data cannot provide the required data contextualization.

Here, we explore implementation, in an academic research setting, of a novel cloud-based software system designed to address this challenge. The software supports systematic definition of experimental procedures as visual processes, acquisition and analysis of primary data, and linking of data and procedures in machine-computable form.

The software was tested on a set of quantitative microbial-physiology experiments. Though time-intensive, definition of experimental procedures in the software enabled much more precise, unambiguous definitions of experiments than conventional protocols.

Once defined, processes were easily reusable and composable into more complex experimental flows. Automatic coupling of process definitions to experimental data enables immediate identification of correlations between procedural details, intended and unintended experimental perturbations, and experimental outcomes.

Software-based experiment descriptions could ultimately replace terse and ambiguous ‘Materials and Methods’ sections in scientific journals, thus promoting reproducibility and reusability of published studies.

URL : Evaluation of a novel cloud-based software platform for structured experiment design and linked data analytics

DOI : https://doi.org/10.1038/sdata.2018.195

Curating Scientific Information in Knowledge Infrastructures

Authors : Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A. Zaidan, Alex Hardisty

Interpreting observational data is a fundamental task in the sciences, specifically in earth and environmental science where observational data are increasingly acquired, curated, and published systematically by environmental research infrastructures.

Typically subject to substantial processing, observational data are used by research communities, their research groups and individual scientists, who interpret such primary data for their meaning in the context of research investigations.

The result of interpretation is information—meaningful secondary or derived data—about the observed environment. Research infrastructures and research communities are thus essential to evolving uninterpreted observational data to information. In digital form, the classical bearer of information are the commonly known “(elaborated) data products,” for instance maps.

In such form, meaning is generally implicit e.g., in map colour coding, and thus largely inaccessible to machines. The systematic acquisition, curation, possible publishing and further processing of information gained in observational data interpretation—as machine readable data and their machine readable meaning—is not common practice among environmental research infrastructures.

For a use case in aerosol science, we elucidate these problems and present a Jupyter based prototype infrastructure that exploits a machine learning approach to interpretation and could support a research community in interpreting observational data and, more importantly, in curating and further using resulting information about a studied natural phenomenon.

URL : Curating Scientific Information in Knowledge Infrastructures

DOI : http://doi.org/10.5334/dsj-2018-021

Nanopublications: A Growing Resource of Provenance-Centric Scientific Linked Data

Authors : Tobias Kuhn, Albert Meroño-Peñuela, Alexander Malic, Jorrit H. Poelen, Allen H. Hurlbert, Emilio Centeno Ortiz, Laura I. Furlong, Núria Queralt-Rosinach, Christine Chichester, Juan M. Banda, Egon Willighagen, Friederike Ehrhart, Chris Evelo, Tareq B. Malas, Michel Dumontier

Nanopublications are a Linked Data format for scholarly data publishing that has received considerable uptake in the last few years. In contrast to the common Linked Data publishing practice, nanopublications work at the granular level of atomic information snippets and provide a consistent container format to attach provenance and metadata at this atomic level.

While the nanopublications format is domain-independent, the datasets that have become available in this format are mostly from Life Science domains, including data about diseases, genes, proteins, drugs, biological pathways, and biotic interactions.

More than 10 million such nanopublications have been published, which now form a valuable resource for studies on the domain level of the given Life Science domains as well as on the more technical levels of provenance modeling and heterogeneous Linked Data.

We provide here an overview of this combined nanopublication dataset, show the results of some overarching analyses, and describe how it can be accessed and queried.

URL : https://arxiv.org/abs/1809.06532

Réflexions sur le fragment dans les pratiques scientifiques en ligne : entre matérialité documentaire et péricope

Auteurs/Authors : Gérald Kembellec, Thomas Bottini

Cette communication propose une réflexion pluridisciplinaire (SIC, ingénierie documentaire et théorie du document numérique, informatique, « humanités numériques », histoire des pratiques savantes) sur les usages du fragment dans les pratiques documentaires scientifiques en ligne.

En prolongement de ces éléments théoriques sont proposés un modèle théorique de la segmentation des contenus en unités de sens (péricope) et des directions d’implémentation.

URL : https://hal-univ-paris10.archives-ouvertes.fr/hal-01700064