Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

Authors : Kevin M. Mendez, Leighton Pritchard, Stacey N. Reinke, David I. Broadhurst


A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility.

The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases.

Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work.

To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike.

Aim of Review

To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science.

Key Scientific Concepts of Review

This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.

URL : Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing


Survey on Scientific Shared Resource Rigor and Reproducibility

Authors : Kevin L. Knudtson, Robert H. Carnahan, Rebecca L. Hegstad-Davies, Nancy C. Fisher, Belynda Hicks, Peter A. Lopez, Susan M. Meyn, Sheenah M. Mische, Frances Weis-Garcia, Lisa D. White, Katia Sol-Church

Shared scientific resources, also known as core facilities, support a significant portion of the research conducted at biomolecular research institutions.

The Association of Biomolecular Resource Facilities (ABRF) established the Committee on Core Rigor and Reproducibility (CCoRRe) to further its mission of integrating advanced technologies, education, and communication in the operations of shared scientific resources in support of reproducible research.

In order to first assess the needs of the scientific shared resource community, the CCoRRe solicited feedback from ABRF members via a survey. The purpose of the survey was to gain information on how U.S. National Institutes of Health (NIH) initiatives on advancing scientific rigor and reproducibility influenced current services and new technology development.

In addition, the survey aimed to identify the challenges and opportunities related to implementation of new reporting requirements and to identify new practices and resources needed to ensure rigorous research.

The results revealed a surprising unfamiliarity with the NIH guidelines. Many of the perceived challenges to the effective implementation of best practices (i.e., those designed to ensure rigor and reproducibility) were similarly noted as a challenge to effective provision of support services in a core setting. Further, most cores routinely use best practices and offer services that support rigor and reproducibility.

These services include access to well-maintained instrumentation and training on experimental design and data analysis as well as data management. Feedback from this survey will enable the ABRF to build better educational resources and share critical best-practice guidelines.

These resources will become important tools to the core community and the researchers they serve to impact rigor and transparency across the range of science and technology.


Open science and modified funding lotteries can impede the natural selection of bad science

Authors : Paul E. Smaldino, Matthew A. Turner, Pablo A. Contreras Kallens

Assessing scientists using exploitable metrics can lead to the degradation of research methods even without any strategic behaviour on the part of individuals, via ‘the natural selection of bad science.’

Institutional incentives to maximize metrics like publication quantity and impact drive this dynamic. Removing these incentives is necessary, but institutional change is slow.

However, recent developments suggest possible solutions with more rapid onsets. These include what we call open science improvements, which can reduce publication bias and improve the efficacy of peer review. In addition, there have been increasing calls for funders to move away from prestige- or innovation-based approaches in favour of lotteries.

We investigated whether such changes are likely to improve the reproducibility of science even in the presence of persistent incentives for publication quantity through computational modelling.

We found that modified lotteries, which allocate funding randomly among proposals that pass a threshold for methodological rigour, effectively reduce the rate of false discoveries, particularly when paired with open science improvements that increase the publication of negative results and improve the quality of peer review.

In the absence of funding that targets rigour, open science improvements can still reduce false discoveries in the published literature but are less likely to improve the overall culture of research practices that underlie those publications.

URL : Open science and modified funding lotteries can impede the natural selection of bad science


Replicable Services for Reproducible Research: A Model for Academic Libraries

Authors : Franklin Sayre, Amy Riegelman

Over the past decade, evidence from disciplines ranging from biology to economics has suggested that many scientific studies may not be reproducible. This has led to declarations in both the scientific and lay press that science is experiencing a “reproducibility crisis” and that this crisis has consequences for the extent to which students, faculty, and the public at large can trust research.

Faculty build on these results with their own research, and students and the public use these results for everything from patient care to public policy. To build a model for how academic libraries can support reproducible research, the authors conducted a review of major guidelines from funders, publishers, and professional societies. Specific recommendations were extracted from guidelines and compared with existing academic library services and librarian expertise.

The authors believe this review shows that many of the recommendations for improving reproducibility are core areas of academic librarianship, including data management, scholarly communication, and methodological support for systematic reviews and data-intensive research.

By increasing our knowledge of disciplinary, journal, funder, and society perspectives on reproducibility, and reframing existing librarian expertise and services, academic librarians will be well positioned to be leaders in supporting reproducible research.

URL : Replicable Services for Reproducible Research: A Model for Academic Libraries


Open science, reproducibility, and transparency in ecology

Authors : Stephen M. Powers, Stephanie E. Hampton

Reproducibility is a key tenet of the scientific process that dictates the reliability and generality of results and methods. The complexities of ecological observations and data present novel challenges in satisfying needs for reproducibility and also transparency.

Ecological systems are dynamic and heterogeneous, interacting with numerous factors that sculpt natural history and that investigators cannot completely control. Observations may be highly dependent on spatial and temporal context, making them very difficult to reproduce, but computational reproducibility can still be achieved.

Computational reproducibility often refers to the ability to produce equivalent analytical outcomes from the same data set using the same code and software as the original study.

When coded workflows are shared, authors and editors provide transparency for readers and allow other researchers to build directly and efficiently on primary work. These qualities may be especially important in ecological applications that have important or controversial implications for science, management, and policy.

Expectations for computational reproducibility and transparency are shifting rapidly in the sciences.

In this work, we highlight many of the unique challenges for ecology along with practical guidelines for reproducibility and transparency, as ecologists continue to participate in the stewardship of critical environmental information and ensure that research methods demonstrate integrity.

URL : Open science, reproducibility, and transparency in ecology


The principles of tomorrow’s university

Authors : Daniel S. Katz, Gabrielle Allen, Lorena A. Barba, Devin R. Berg, Holly Bik, Carl Boettiger, Christine L. Borgman, C. Titus Brown, Stuart Buck, Randy Burd, Anita de Waard, Martin Paul Eve, Brian E. Granger, Josh Greenberg, Adina Howe, Bill Howe, May Khanna, Timothy L. Killeen, Matthew Mayernik, Erin McKiernan, Chris Mentzel, Nirav Merchant, Kyle E. Niemeyer, Laura Noren, Sarah M. Nusser, Daniel A. Reed, Edward Seidel, MacKenzie Smith, Jeffrey R. Spies, Matt Turk, John D. Van Horn, Jay Walsh

In the 21st Century, research is increasingly data- and computation-driven. Researchers, funders, and the larger community today emphasize the traits of openness and reproducibility.

In March 2017, 13 mostly early-career research leaders who are building their careers around these traits came together with ten university leaders (presidents, vice presidents, and vice provosts), representatives from four funding agencies, and eleven organizers and other stakeholders in an NIH- and NSF-funded one-day, invitation-only workshop titled “Imagining Tomorrow’s University.”

Workshop attendees were charged with launching a new dialog around open research – the current status, opportunities for advancement, and challenges that limit sharing.

The workshop examined how the internet-enabled research world has changed, and how universities need to change to adapt commensurately, aiming to understand how universities can and should make themselves competitive and attract the best students, staff, and faculty in this new world.

During the workshop, the participants re-imagined scholarship, education, and institutions for an open, networked era, to uncover new opportunities for universities to create value and serve society.

They expressed the results of these deliberations as a set of 22 principles of tomorrow’s university across six areas: credit and attribution, communities, outreach and engagement, education, preservation and reproducibility, and technologies.

Activities that follow on from workshop results take one of three forms. First, since the workshop, a number of workshop authors have further developed and published their white papers to make their reflections and recommendations more concrete.

These authors are also conducting efforts to implement these ideas, and to make changes in the university system.

Second, we plan to organise a follow-up workshop that focuses on how these principles could be implemented.

Third, we believe that the outcomes of this workshop support and are connected with recent theoretical work on the position and future of open knowledge institutions.

URL : The principles of tomorrow’s university


Reproducible research practices, transparency, and open access data in the biomedical literature, 2015–2017

Authors : Joshua D. Wallach, Kevin W. Boyack, John P. A. Ioannidis

Currently, there is a growing interest in ensuring the transparency and reproducibility of the published scientific literature. According to a previous evaluation of 441 biomedical journals articles published in 2000–2014, the biomedical literature largely lacked transparency in important dimensions.

Here, we surveyed a random sample of 149 biomedical articles published between 2015 and 2017 and determined the proportion reporting sources of public and/or private funding and conflicts of interests, sharing protocols and raw data, and undergoing rigorous independent replication and reproducibility checks.

We also investigated what can be learned about reproducibility and transparency indicators from open access data provided on PubMed. The majority of the 149 studies disclosed some information regarding funding (103, 69.1% [95% confidence interval, 61.0% to 76.3%]) or conflicts of interest (97, 65.1% [56.8% to 72.6%]).

Among the 104 articles with empirical data in which protocols or data sharing would be pertinent, 19 (18.3% [11.6% to 27.3%]) discussed publicly available data; only one (1.0% [0.1% to 6.0%]) included a link to a full study protocol. Among the 97 articles in which replication in studies with different data would be pertinent, there were five replication efforts (5.2% [1.9% to 12.2%]).

Although clinical trial identification numbers and funding details were often provided on PubMed, only two of the articles without a full text article in PubMed Central that discussed publicly available data at the full text level also contained information related to data sharing on PubMed; none had a conflicts of interest statement on PubMed.

Our evaluation suggests that although there have been improvements over the last few years in certain key indicators of reproducibility and transparency, opportunities exist to improve reproducible research practices across the biomedical literature and to make features related to reproducibility more readily visible in PubMed.

URL : Reproducible research practices, transparency, and open access data in the biomedical literature, 2015–2017