Technical and social issues influencing the adoption of preprints in the life sciences

Authors : Naomi C Penfold, Jessica K Polka

Preprints are gaining visibility in many fields. Thanks to the explosion of bioRxiv, an online server for preprints in biology, versions of manuscripts prior to the completion of journal-organized peer review are poised to become a standard component of the publishing experience in the life sciences.

Here we provide an overview of current challenges facing preprints, both technical and social, and a vision for their future development, from unbundling the functions of publication to exploring different communication formats.

DOI : https://doi.org/10.7287/peerj.preprints.27954v1

Governance of a global genetic resource commons for non-commercial research: A case-study of the DNA barcode commons

Authors : Janis Geary, Tania Bubela

Life sciences research that uses genetic resources is increasingly collaborative and global, yet collective action remains a significant barrier to the creation and management of shared research resources. These resources include sequence data and associated metadata, and biological samples, and can be understood as a type of knowledge commons.

Collective action by stakeholders to create and use knowledge commons for research has potential benefits for all involved, including minimizing costs and sharing risks, but there are gaps in our understanding of how institutional arrangements may promote such collective action in the context of global genetic resources.

We address this research gap by examining the attributes of an exemplar global knowledge commons: The DNA barcode commons. DNA barcodes are short, standardized gene regions that can be used to inexpensively identify unknown specimens, and proponents have led international efforts to make DNA barcodes a standard species identification tool.

Our research examined if and how attributes of the DNA barcode commons, including governance of DNA barcode resources and management of infrastructure, facilitate global participation in DNA barcoding efforts. Our data sources included key informant interviews, organizational documents, scientific outputs of the DNA barcoding community, and DNA barcode record submissions.

Our research suggested that the goal of creating a globally inclusive DNA barcode commons is partially impeded by the assumption that scientific norms and expectations held by researchers in high income countries are universal. We found scientific norms are informed by a complex history of resource misappropriation and mistrust between stakeholders.

DNA barcode organizations can mitigate the challenges caused by its global membership through creating more inclusive governance structures, developing norms for the community are specific to the context of DNA barcoding, and through increasing awareness and knowledge of pertinent legal frameworks.

URL : Governance of a global genetic resource commons for non-commercial research: A case-study of the DNA barcode commons

Alternative location : https://www.thecommonsjournal.org/articles/10.18352/ijc.859/

On the value of preprints: an early career researcher perspective

Authors : Sarvenaz Sarabipour​, Humberto J Debat, Edward Emmott, Steven Burgess, Benjamin Schwessinger, Zach Hensel

Peer-reviewed journal publication is the main means for academic researchers in the life sciences to create a permanent, public record of their work. These publications are also the de facto currency for career progress, with a strong link between journal brand recognition and perceived value.

The current peer-review process can lead to long delays between submission and publication, with cycles of rejection, revision and resubmission causing redundant peer review.

This situation creates unique challenges for early career researchers (ECRs), who rely heavily on timely publication of their work to gain recognition for their efforts. ECRs face changes in the academic landscape including the increased interdisciplinarity of life sciences research, expansion of the researcher population and consequent shifts in employer and funding demands.

The publication of preprints, publicly available scientific manuscripts posted on dedicated preprint servers prior to journal managed peer-review, can play a key role in addressing these ECR challenges.

Preprinting benefits include rapid dissemination of academic work, open access, establishing priority or concurrence, receiving feedback and facilitating collaborations. While there is a growing appreciation for and adoption of preprints, a minority of all articles in life sciences and medicine are preprinted.

The current low rate of preprint submissions in life sciences and ECR concerns regarding preprinting needs to be addressed.

We provide a perspective from an interdisciplinary group of early career researchers on the value of preprints and advocate the wide adoption of preprints to advance knowledge and facilitate career development.

URL : On the value of preprints: an early career researcher perspective

DOI : https://doi.org/10.7287/peerj.preprints.27400v1

A guideline for reporting experimental protocols in life sciences

Authors : Olga Giraldo​, Alexander Garcia, Oscar Corcho

Experimental protocols are key when planning, performing and publishing research in many disciplines, especially in relation to the reporting of materials and methods. However, they vary in their content, structure and associated data elements.

This article presents a guideline for describing key content for reporting experimental protocols in the domain of life sciences, together with the methodology followed in order to develop such guideline.

As part of our work, we propose a checklist that contains 17 data elements that we consider fundamental to facilitate the execution of the protocol. These data elements are formally described in the SMART Protocols ontology.

By providing guidance for the key content to be reported, we aim (1) to make it easier for authors to report experimental protocols with necessary and sufficient information that allow others to reproduce an experiment, (2) to promote consistency across laboratories by delivering an adaptable set of data elements, and (3) to make it easier for reviewers and editors to measure the quality of submitted manuscripts against an established criteria.

Our checklist focuses on the content, what should be included. Rather than advocating a specific format for protocols in life sciences, the checklist includes a full description of the key data elements that facilitate the execution of the protocol.

URL : A guideline for reporting experimental protocols in life sciences

DOI : https://doi.org/10.7717/peerj.4795

Interoperability and FAIRness through a novel combination of Web technologies

Authors : Mark D. Wilkinson, Ruben Verborgh, Luiz Olavo Bonino da Silva Santos, Tim Clark, Morris A. Swertz, Fleur D.L. Kelpin, Alasdair J.G. Gray, Erik A. Schultes, Erik M. van Mulligen, Paolo Ciccarese, Arnold Kuzniar, Anand Gavai, Mark Thompson, Rajaram Kaliyaperumal, Jerven T. Bolleman, Michel Dumontier

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT).

These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not.

The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability.

Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings.

We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles.

The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

URL : Interoperability and FAIRness through a novel combination of Web technologies

DOI : https://doi.org/10.7717/peerj-cs.110

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

Authors : Mariam Alqasab, Suzanne M. Embury, Sandra de F. Mendes Sampaio

In the era of data science, datasets are shared widely and used for many purposes unforeseen by the original creators of the data. In this context, defects in datasets can have far reaching consequences, spreading from dataset to dataset, and affecting the consumers of data in ways that are hard to predict or quantify.

Some form of waste is often the result. For example, scientists using defective data to propose hypotheses for experimentation may waste their limited wet lab resources chasing the wrong experimental targets. Scarce drug trial resources may be used to test drugs that actually have little chance of giving a cure.

Because of the potential real world costs, database owners care about providing high quality data. Automated curation tools can be used to an extent to discover and correct some forms of defect.

However, in some areas human curation, performed by highly-trained domain experts, is needed to ensure that the data represents our current interpretation of reality accurately.

Human curators are expensive, and there is far more curation work to be done than there are curators available to perform it. Tools and techniques are needed to enable the full value to be obtained from the curation effort currently available.

In this paper,we explore one possible approach to maximising the value obtained from human curators, by automatically extracting information about data defects and corrections from the work that the curators do.

This information is packaged in a source independent form, to allow it to be used by the owners of other databases (for which human curation effort is not available or is insufficient).

This amplifies the efforts of the human curators, allowing their work to be applied to other sources, without requiring any additional effort or change in their processes or tool sets. We show that this approach can discover significant numbers of defects, which can also be found in other sources.

URL : Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

DOI : https://doi.org/10.2218/ijdc.v12i1.495

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

Authors : Julie A. McMurry, Nick Juty, Niklas Blomberg, Tony Burdett, Tom Conlin, Nathalie Conte, Mélanie Courtot, John Deck, Michel Dumontier, Donal K. Fellows, Alejandra Gonzalez-Beltran, Philipp Gormanns, Jeffrey Grethe, Janna Hastings, Jean-Karim Hériché, Henning Hermjakob, Jon C. Ison, Rafael C. Jimenez, Simon Jupp, John Kunze, Camille Laibe, Nicolas Le Novère, James Malone, Maria Jesus Martin, Johanna R. McEntyre, Chris Morris, Juha Muilu, Wolfgang Müller, Philippe Rocca-Serra, Susanna-Assunta Sansone, Murat Sariyar, Jacky L. Snoep, Stian Soiland-Reyes, Natalie J. Stanford, Neil Swainston, Nicole Washington, Alan R. Williams, Sarala M. Wimalaratne, Lilly M. Winfree, Katherine Wolstencroft, Carole Goble, Christopher J. Mungall, Melissa A. Haendel, Helen Parkinson

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure.

Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers.

We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability.

We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

URL : Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

DOI : https://doi.org/10.1371/journal.pbio.2001414