Data Sharing: Convert Challenges into Opportunities

Author : Ana Sofia Figueiredo

Initiatives for sharing research data are opportunities to increase the pace of knowledge discovery and scientific progress. The reuse of research data has the potential to avoid the duplication of data sets and to bring new views from multiple analysis of the same data set.

For example, the study of genomic variations associated with cancer profits from the universal collection of such data and helps in selecting the most appropriate therapy for a specific patient. However, data sharing poses challenges to the scientific community.

These challenges are of ethical, cultural, legal, financial, or technical nature. This article reviews the impact that data sharing has in science and society and presents guidelines to improve the efficient sharing of research data.

URL : Data Sharing: Convert Challenges into Opportunities

DOI : https://doi.org/10.3389/fpubh.2017.00327

Biotea: semantics for Pubmed Central

Authors : Alexander Garcia​, Federico Lopez, Leyla Garcia, Olga Giraldo, Victor Bucheli, Michel Dumontier

A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies.

In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology.

We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language.

We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.

URL : Biotea: semantics for Pubmed Central

DOI : https://doi.org/10.7717/peerj.4201

Completeness and overlap in open access systems: Search engines, aggregate institutional repositories and physics-related open sources

Authors : Ming-yueh Tsay, Tai-luan Wu, Ling-li Tseng

This study examines the completeness and overlap of coverage in physics of six open access scholarly communication systems, including two search engines (Google Scholar and Microsoft Academic), two aggregate institutional repositories (OAIster and OpenDOAR), and two physics-related open sources (arXiv.org and Astrophysics Data System).

The 2001–2013 Nobel Laureates in Physics served as the sample. Bibliographic records of their publications were retrieved and downloaded from each system, and a computer program was developed to perform the analytical tasks of sorting, comparison, elimination, aggregation and statistical calculations.

Quantitative analyses and cross-referencing were performed to determine the completeness and overlap of the system coverage of the six open access systems.

The results may enable scholars to select an appropriate open access system as an efficient scholarly communication channel, and academic institutions may build institutional repositories or independently create citation index systems in the future. Suggestions on indicators and tools for academic assessment are presented based on the comprehensiveness assessment of each system.

URL : Completeness and overlap in open access systems: Search engines, aggregate institutional repositories and physics-related open sources

DOI : https://doi.org/10.1371/journal.pone.0189751

Documentation and Visualisation of Workflows for Effective Communication, Collaboration and Publication @ Source

Authors : Cerys Willoughby, Jeremy G. Frey

Workflows processing data from research activities and driving in silico experiments are becoming an increasingly important method for conducting scientific research. Workflows have the advantage that not only can they be automated and used to process data repeatedly, but they can also be reused – in part or whole – enabling them to be evolved for use in new experiments.

A number of studies have investigated strategies for storing and sharing workflows for the benefit of reuse. These have revealed that simply storing workflows in repositories without additional context does not enable workflows to be successfully reused.

These studies have investigated what additional resources are needed to facilitate users of workflows and in particular to add provenance traces and to make workflows and their resources machine-readable.

These additions also include adding metadata for curation, annotations for comprehension, and including data sets to provide additional context to the workflow. Ultimately though, these mechanisms still rely on researchers having access to the software to view and run the workflows.

We argue that there are situations where researchers may want to understand a workflow that goes beyond what provenance traces provide and without having to run the workflow directly; there are many situations in which it can be difficult or impossible to run the original workflow.

To that end, we have investigated the creation of an interactive workflow visualization that captures the flow chart element of the workflow with additional context including annotations, descriptions, parameters, metadata and input, intermediate, and results data that can be added to the record of a workflow experiment to enhance both curation and add value to enable reuse.

We have created interactive workflow visualisations for the popular workflow creation tool KNIME, which does not provide users with an in-built function to extract provenance information that can otherwise only be viewed through the tool itself.

Making use of the strengths of KNIME for adding documentation and user-defined metadata we can extract and create a visualisation and curation package that encourages and enhances curation@source, facilitating effective communication, collaboration, and reuse of workflows.

URL : Documentation and Visualisation of Workflows for Effective Communication, Collaboration and Publication @ Source

DOI : https://doi.org/10.2218/ijdc.v12i1.532

Connecting Data Publication to the Research Workflow: A Preliminary Analysis

Authors : Sünje Dallmeier-Tiessen, Varsha Khodiyar, Fiona Murphy, Amy Nurnberger, Lisa Raymond, Angus Whyte

The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation.

Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society.

Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers.

Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository.

This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process.

We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream.

These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data.

We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.

URL : Connecting Data Publication to the Research Workflow: A Preliminary Analysis

DOI : https://doi.org/10.2218/ijdc.v12i1.533

Survey on open peer review: Attitudes and experience amongst editors, authors and reviewers

Authors : Tony Ross-Hellauer, Arvid Deppe, Birgit Schmidt

Open peer review (OPR) is a cornerstone of the emergent Open Science agenda. Yet to date no large-scale survey of attitudes towards OPR amongst academic editors, authors, reviewers and publishers has been undertaken.

This paper presents the findings of an online survey, conducted for the OpenAIRE2020 project during September and October 2016, that sought to bridge this information gap in order to aid the development of appropriate OPR approaches by providing evidence about attitudes towards and levels of experience with OPR.

The results of this cross-disciplinary survey, which received 3,062 full responses, show the majority (60.3%) of respondents to be believe that OPR as a general concept should be mainstream scholarly practice (although attitudes to individual traits varied, and open identities peer review was not generally favoured). Respondents were also in favour of other areas of Open Science, like Open Access (88.2%) and Open Data (80.3%).

Among respondents we observed high levels of experience with OPR, with three out of four (76.2%) reporting having taken part in an OPR process as author, reviewer or editor.

There were also high levels of support for most of the traits of OPR, particularly open interaction, open reports and final-version commenting. Respondents were against opening reviewer identities to authors, however, with more than half believing it would make peer review worse.

Overall satisfaction with the peer review system used by scholarly journals seems to strongly vary across disciplines. Taken together, these findings are very encouraging for OPR’s prospects for moving mainstream but indicate that due care must be taken to avoid a “one-size fits all” solution and to tailor such systems to differing (especially disciplinary) contexts.

OPR is an evolving phenomenon and hence future studies are to be encouraged, especially to further explore differences between disciplines and monitor the evolution of attitudes.

URL : Survey on open peer review: Attitudes and experience amongst editors, authors and reviewers

DOI : https://doi.org/10.1371/journal.pone.0189311

Monitoring the transition to open access: December 2017

The studies on which this report is based were undertaken by a team led by Michael Jubb and comprising Andrew Plume, Stephanie Oeben and Lydia Brammer, Elsevier; Rob Johnson and Cihan Bütün, Research Consulting; Stephen Pinfield, University of Sheffield.

Following the Finch Report in 2012, Universities UK established an Open Access Coordination Group to support the transition to open access (OA)  for articles in scholarly journals. The Group  commissioned an initial report published in 2015 to gather evidence on key features of that transition.

This second report aims to build on those findings, and to examine trends  over the period since the major funders of research in the UK established new policies to promote OA.

URL : Monitoring the transition to open access: December 2017