Linked Research on the Decentralised Web

Author : Sarven Capadisli

This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media.

I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving.

The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged.

Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems.

From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication.

Thus, I contribute a new communication protocol, Linked Data Notifications (published as a W3C Recommendation) which enables decentralised notifications on the Web, and provide implementations pertinent to the academic publishing use case. So far we have seen decentralised notifications applied in research dissemination or collaboration scenarios, as well as for archival activities and scientific experiments.

Another core contribution of this work is a Web standards-based implementation of a clientside tool, dokieli, for decentralised article publishing, annotations and social interactions. dokieli can be used to fulfill the scholarly functions of registration, awareness, certification, and archiving, all in a decentralised manner, returning control of research contributions and discourse to individual researchers.

The overarching conclusion of the thesis is that Web technologies can be used to create a fully functioning ecosystem for research communication. Using the framework of Web architecture, and loosely coupling the four functions, an accessible and inclusive ecosystem can be realised whereby users are able to use and switch between interoperable applications without interfering with existing data.

Technical solutions alone do not suffice of course, so this thesis also takes into account the need for a change in the traditional mode of thinking amongst scholars, and presents the Linked Research initiative as an ongoing effort toward researcher autonomy in a social system, and universal access to human- and machine-readable information.

Outcomes of this outreach work so far include an increase in the number of individuals self-hosting their research artifacts, workshops publishing accessible proceedings on the Web, in-the-wild experiments with open and public peer-review, and semantic graphs of contributions to conference proceedings and journals (the Linked Open Research Cloud).

Some of the future challenges include: addressing the social implications of decentralised Web publishing, as well as the design of ethically grounded interoperable mechanisms; cultivating privacy aware information spaces; personal or community-controlled on-demand archiving services; and further design of decentralised applications that are aware of the core functions of scientific communication.

URL : https://csarven.ca/linked-research-decentralised-web

Creating Structured Linked Data to Generate Scholarly Profiles: A Pilot Project using Wikidata and Scholia

Authors : Mairelys Lemus-Rojas, Jere D. Odell

INTRODUCTION

Wikidata, a knowledge base for structured linked data, provides an open platform for curating scholarly communication data. Because all elements in a Wikidata entry are linked to defining elements and metadata, other web systems can harvest and display the data in meaningful ways.

Thus, Wikidata has the capacity to serve as the data source for faculty profiles. Scholia is an example of how third-party tools can leverage the power of Wikidata to provisde faculty profiles and bibliographic, data-driven visualizations.

DESCRIPTION OF PROGRAM

In this article, we share our methods for contributing to Wikidata and displaying the data with Scholia.

We deployed these methods as part of a pilot project in which we contributed data about a small but unique school on the Indiana University-Purdue University Indianapolis (IUPUI) campus, the IU Lilly Family School of Philanthropy.

NEXT STEPS

Following the completion of our pilot project, we aim to find additional methods for contributing large data collections to Wikidata. Specifically, we seek to contribute scholarly communication data that the library already maintains in other systems.

We are also facilitating Wikidata edit-a-thons to increase the library’s familiarity with the knowledge base and our capacity to contribute to the site.

URL : Creating Structured Linked Data to Generate Scholarly Profiles: A Pilot Project using Wikidata and Scholia

DOI : https://doi.org/10.7710/2162-3309.2272

Automatically Annotating Articles Towards Opening and Reusing Transparent Peer Reviews

Authors : Afshin Sadeghi, Sarven Capadisli, Johannes Wilm, Christoph Lange, Philipp Mayr

An increasing number of scientific publications are created in open and transparent peer review models: a submission is published first, and then reviewers are invited, or a submission is reviewed in a closed environment but then these reviews are published with the final article, or combinations of these.

Reasons for open peer review include giving better credit to reviewers and enabling readers to better appraise the quality of a publication. In most cases, the full, unstructured text of an open review is published next to the full, unstructured text of the article reviewed.

This approach prevents human readers from getting a quick impression of the quality of parts of an article, and it does not easily support secondary exploitation, e.g., for scientometrics on reviews.

While document formats have been proposed for publishing structured articles including reviews, integrated tool support for entire open peer review workflows resulting in such documents is still scarce.

We present AR-Annotator, the Automatic Article and Review Annotator which employs a semantic information model of an article and its reviews, using semantic markup and unique identifiers for all entities of interest.

The fine-grained article structure is not only exposed to authors and reviewers but also preserved in the published version. We publish articles and their reviews in a Linked Data representation and thus maximize their reusability by third-party applications.

We demonstrate this reusability by running quality-related queries against the structured representation of articles and their reviews.

URL : https://arxiv.org/abs/1812.01027

Supporting FAIR Data Principles with Fedora

Author: David Wilcox

Making data findable, accessible, interoperable, and re-usable is an important but challenging goal. From an infrastructure perspective, repository technologies play a key role in supporting FAIR data principles.

Fedora is a flexible, extensible, open source repository platform for managing, preserving, and providing access to digital content. Fedora is used in a wide variety of institutions including libraries, museums, archives, and government organizations.

Fedora provides native linked data capabilities and a modular architecture based on well-documented APIs and ease of integration with existing applications. As both a project and a community, Fedora has been increasingly focused on research data management, making it well-suited to supporting FAIR data principles as a repository platform.

Fedora provides strong support for persistent identifiers, both by minting HTTP URIs for each resource and by allowing any number of additional identifiers to be associated with resources as RDF properties.

Fedora also supports rich metadata in any schema that can be indexed and disseminated using a variety of protocols and services. As a linked data server, Fedora allows resources to be semantically linked both within the repository and on the broader web.

Along with these and other features supporting research data management, the Fedora community has been actively participating in related initiatives, most notably the Research Data Alliance.

Fedora representatives participate in a number of interest and working groups focused on requirements and interoperability for research data repository platforms.

This participation allows the Fedora project to both influence and be influenced by an international group of Research Data Alliance stakeholders. This paper will describe how Fedora supports FAIR data principles, both in terms of relevant features and community participation in related initiatives.

URL : Supporting FAIR Data Principles with Fedora

DOI : http://doi.org/10.18352/lq.10247

Evaluation of a novel cloud-based software platform for structured experiment design and linked data analytics

Authors : Hannes Juergens, Matthijs Niemeijer, Laura D. Jennings-Antipov, Robert Mans, Jack More, Antonius J. A. van Maris, Jack T. Pronk, Timothy S. Gardner

Open data in science requires precise definition of experimental procedures used in data generation, but traditional practices for sharing protocols and data cannot provide the required data contextualization.

Here, we explore implementation, in an academic research setting, of a novel cloud-based software system designed to address this challenge. The software supports systematic definition of experimental procedures as visual processes, acquisition and analysis of primary data, and linking of data and procedures in machine-computable form.

The software was tested on a set of quantitative microbial-physiology experiments. Though time-intensive, definition of experimental procedures in the software enabled much more precise, unambiguous definitions of experiments than conventional protocols.

Once defined, processes were easily reusable and composable into more complex experimental flows. Automatic coupling of process definitions to experimental data enables immediate identification of correlations between procedural details, intended and unintended experimental perturbations, and experimental outcomes.

Software-based experiment descriptions could ultimately replace terse and ambiguous ‘Materials and Methods’ sections in scientific journals, thus promoting reproducibility and reusability of published studies.

URL : Evaluation of a novel cloud-based software platform for structured experiment design and linked data analytics

DOI : https://doi.org/10.1038/sdata.2018.195

Curating Scientific Information in Knowledge Infrastructures

Authors : Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A. Zaidan, Alex Hardisty

Interpreting observational data is a fundamental task in the sciences, specifically in earth and environmental science where observational data are increasingly acquired, curated, and published systematically by environmental research infrastructures.

Typically subject to substantial processing, observational data are used by research communities, their research groups and individual scientists, who interpret such primary data for their meaning in the context of research investigations.

The result of interpretation is information—meaningful secondary or derived data—about the observed environment. Research infrastructures and research communities are thus essential to evolving uninterpreted observational data to information. In digital form, the classical bearer of information are the commonly known “(elaborated) data products,” for instance maps.

In such form, meaning is generally implicit e.g., in map colour coding, and thus largely inaccessible to machines. The systematic acquisition, curation, possible publishing and further processing of information gained in observational data interpretation—as machine readable data and their machine readable meaning—is not common practice among environmental research infrastructures.

For a use case in aerosol science, we elucidate these problems and present a Jupyter based prototype infrastructure that exploits a machine learning approach to interpretation and could support a research community in interpreting observational data and, more importantly, in curating and further using resulting information about a studied natural phenomenon.

URL : Curating Scientific Information in Knowledge Infrastructures

DOI : http://doi.org/10.5334/dsj-2018-021

Nanopublications: A Growing Resource of Provenance-Centric Scientific Linked Data

Authors : Tobias Kuhn, Albert Meroño-Peñuela, Alexander Malic, Jorrit H. Poelen, Allen H. Hurlbert, Emilio Centeno Ortiz, Laura I. Furlong, Núria Queralt-Rosinach, Christine Chichester, Juan M. Banda, Egon Willighagen, Friederike Ehrhart, Chris Evelo, Tareq B. Malas, Michel Dumontier

Nanopublications are a Linked Data format for scholarly data publishing that has received considerable uptake in the last few years. In contrast to the common Linked Data publishing practice, nanopublications work at the granular level of atomic information snippets and provide a consistent container format to attach provenance and metadata at this atomic level.

While the nanopublications format is domain-independent, the datasets that have become available in this format are mostly from Life Science domains, including data about diseases, genes, proteins, drugs, biological pathways, and biotic interactions.

More than 10 million such nanopublications have been published, which now form a valuable resource for studies on the domain level of the given Life Science domains as well as on the more technical levels of provenance modeling and heterogeneous Linked Data.

We provide here an overview of this combined nanopublication dataset, show the results of some overarching analyses, and describe how it can be accessed and queried.

URL : https://arxiv.org/abs/1809.06532