On the value of preprints: an early career researcher perspective

Authors : Sarvenaz Sarabipour​, Humberto J Debat, Edward Emmott, Steven Burgess, Benjamin Schwessinger, Zach Hensel

Peer-reviewed journal publication is the main means for academic researchers in the life sciences to create a permanent, public record of their work. These publications are also the de facto currency for career progress, with a strong link between journal brand recognition and perceived value.

The current peer-review process can lead to long delays between submission and publication, with cycles of rejection, revision and resubmission causing redundant peer review.

This situation creates unique challenges for early career researchers (ECRs), who rely heavily on timely publication of their work to gain recognition for their efforts. ECRs face changes in the academic landscape including the increased interdisciplinarity of life sciences research, expansion of the researcher population and consequent shifts in employer and funding demands.

The publication of preprints, publicly available scientific manuscripts posted on dedicated preprint servers prior to journal managed peer-review, can play a key role in addressing these ECR challenges.

Preprinting benefits include rapid dissemination of academic work, open access, establishing priority or concurrence, receiving feedback and facilitating collaborations. While there is a growing appreciation for and adoption of preprints, a minority of all articles in life sciences and medicine are preprinted.

The current low rate of preprint submissions in life sciences and ECR concerns regarding preprinting needs to be addressed.

We provide a perspective from an interdisciplinary group of early career researchers on the value of preprints and advocate the wide adoption of preprints to advance knowledge and facilitate career development.

URL : On the value of preprints: an early career researcher perspective

DOI : https://doi.org/10.7287/peerj.preprints.27400v1

A guideline for reporting experimental protocols in life sciences

Authors : Olga Giraldo​, Alexander Garcia, Oscar Corcho

Experimental protocols are key when planning, performing and publishing research in many disciplines, especially in relation to the reporting of materials and methods. However, they vary in their content, structure and associated data elements.

This article presents a guideline for describing key content for reporting experimental protocols in the domain of life sciences, together with the methodology followed in order to develop such guideline.

As part of our work, we propose a checklist that contains 17 data elements that we consider fundamental to facilitate the execution of the protocol. These data elements are formally described in the SMART Protocols ontology.

By providing guidance for the key content to be reported, we aim (1) to make it easier for authors to report experimental protocols with necessary and sufficient information that allow others to reproduce an experiment, (2) to promote consistency across laboratories by delivering an adaptable set of data elements, and (3) to make it easier for reviewers and editors to measure the quality of submitted manuscripts against an established criteria.

Our checklist focuses on the content, what should be included. Rather than advocating a specific format for protocols in life sciences, the checklist includes a full description of the key data elements that facilitate the execution of the protocol.

URL : A guideline for reporting experimental protocols in life sciences

DOI : https://doi.org/10.7717/peerj.4795

Interoperability and FAIRness through a novel combination of Web technologies

Authors : Mark D. Wilkinson, Ruben Verborgh, Luiz Olavo Bonino da Silva Santos, Tim Clark, Morris A. Swertz, Fleur D.L. Kelpin, Alasdair J.G. Gray, Erik A. Schultes, Erik M. van Mulligen, Paolo Ciccarese, Arnold Kuzniar, Anand Gavai, Mark Thompson, Rajaram Kaliyaperumal, Jerven T. Bolleman, Michel Dumontier

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT).

These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not.

The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability.

Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings.

We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles.

The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

URL : Interoperability and FAIRness through a novel combination of Web technologies

DOI : https://doi.org/10.7717/peerj-cs.110

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

Authors : Mariam Alqasab, Suzanne M. Embury, Sandra de F. Mendes Sampaio

In the era of data science, datasets are shared widely and used for many purposes unforeseen by the original creators of the data. In this context, defects in datasets can have far reaching consequences, spreading from dataset to dataset, and affecting the consumers of data in ways that are hard to predict or quantify.

Some form of waste is often the result. For example, scientists using defective data to propose hypotheses for experimentation may waste their limited wet lab resources chasing the wrong experimental targets. Scarce drug trial resources may be used to test drugs that actually have little chance of giving a cure.

Because of the potential real world costs, database owners care about providing high quality data. Automated curation tools can be used to an extent to discover and correct some forms of defect.

However, in some areas human curation, performed by highly-trained domain experts, is needed to ensure that the data represents our current interpretation of reality accurately.

Human curators are expensive, and there is far more curation work to be done than there are curators available to perform it. Tools and techniques are needed to enable the full value to be obtained from the curation effort currently available.

In this paper,we explore one possible approach to maximising the value obtained from human curators, by automatically extracting information about data defects and corrections from the work that the curators do.

This information is packaged in a source independent form, to allow it to be used by the owners of other databases (for which human curation effort is not available or is insufficient).

This amplifies the efforts of the human curators, allowing their work to be applied to other sources, without requiring any additional effort or change in their processes or tool sets. We show that this approach can discover significant numbers of defects, which can also be found in other sources.

URL : Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

DOI : https://doi.org/10.2218/ijdc.v12i1.495

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

Authors : Julie A. McMurry, Nick Juty, Niklas Blomberg, Tony Burdett, Tom Conlin, Nathalie Conte, Mélanie Courtot, John Deck, Michel Dumontier, Donal K. Fellows, Alejandra Gonzalez-Beltran, Philipp Gormanns, Jeffrey Grethe, Janna Hastings, Jean-Karim Hériché, Henning Hermjakob, Jon C. Ison, Rafael C. Jimenez, Simon Jupp, John Kunze, Camille Laibe, Nicolas Le Novère, James Malone, Maria Jesus Martin, Johanna R. McEntyre, Chris Morris, Juha Muilu, Wolfgang Müller, Philippe Rocca-Serra, Susanna-Assunta Sansone, Murat Sariyar, Jacky L. Snoep, Stian Soiland-Reyes, Natalie J. Stanford, Neil Swainston, Nicole Washington, Alan R. Williams, Sarala M. Wimalaratne, Lilly M. Winfree, Katherine Wolstencroft, Carole Goble, Christopher J. Mungall, Melissa A. Haendel, Helen Parkinson

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure.

Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers.

We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability.

We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

URL : Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

DOI : https://doi.org/10.1371/journal.pbio.2001414

Risk of Bias in Reports of In Vivo Research: A Focus for Improvement

The reliability of experimental findings depends on the rigour of experimental design. Here we show limited reporting of measures to reduce the risk of bias in a random sample of life sciences publications, significantly lower reporting of randomisation in work published in journals of high impact, and very limited reporting of measures to reduce the risk of bias in publications from leading United Kingdom institutions. Ascertainment of differences between institutions might serve both as a measure of research quality and as a tool for institutional efforts to improve research quality.

URL : Risk of Bias in Reports of In Vivo Research: A Focus for Improvement

DOI: 10.1371/journal.pbio.1002273