Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

Authors : Kevin M. Mendez, Leighton Pritchard, Stacey N. Reinke, David I. Broadhurst

Background

A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility.

The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases.

Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work.

To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike.

Aim of Review

To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science.

Key Scientific Concepts of Review

This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.

URL : Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

DOI : https://doi.org/10.1007/s11306-019-1588-0

The principles of tomorrow’s university

Authors : Daniel S. Katz, Gabrielle Allen, Lorena A. Barba, Devin R. Berg, Holly Bik, Carl Boettiger, Christine L. Borgman, C. Titus Brown, Stuart Buck, Randy Burd, Anita de Waard, Martin Paul Eve, Brian E. Granger, Josh Greenberg, Adina Howe, Bill Howe, May Khanna, Timothy L. Killeen, Matthew Mayernik, Erin McKiernan, Chris Mentzel, Nirav Merchant, Kyle E. Niemeyer, Laura Noren, Sarah M. Nusser, Daniel A. Reed, Edward Seidel, MacKenzie Smith, Jeffrey R. Spies, Matt Turk, John D. Van Horn, Jay Walsh

In the 21st Century, research is increasingly data- and computation-driven. Researchers, funders, and the larger community today emphasize the traits of openness and reproducibility.

In March 2017, 13 mostly early-career research leaders who are building their careers around these traits came together with ten university leaders (presidents, vice presidents, and vice provosts), representatives from four funding agencies, and eleven organizers and other stakeholders in an NIH- and NSF-funded one-day, invitation-only workshop titled “Imagining Tomorrow’s University.”

Workshop attendees were charged with launching a new dialog around open research – the current status, opportunities for advancement, and challenges that limit sharing.

The workshop examined how the internet-enabled research world has changed, and how universities need to change to adapt commensurately, aiming to understand how universities can and should make themselves competitive and attract the best students, staff, and faculty in this new world.

During the workshop, the participants re-imagined scholarship, education, and institutions for an open, networked era, to uncover new opportunities for universities to create value and serve society.

They expressed the results of these deliberations as a set of 22 principles of tomorrow’s university across six areas: credit and attribution, communities, outreach and engagement, education, preservation and reproducibility, and technologies.

Activities that follow on from workshop results take one of three forms. First, since the workshop, a number of workshop authors have further developed and published their white papers to make their reflections and recommendations more concrete.

These authors are also conducting efforts to implement these ideas, and to make changes in the university system.

Second, we plan to organise a follow-up workshop that focuses on how these principles could be implemented.

Third, we believe that the outcomes of this workshop support and are connected with recent theoretical work on the position and future of open knowledge institutions.

URL : The principles of tomorrow’s university

DOI : https://doi.org/10.12688/f1000research.17425.1

Curating Scientific Information in Knowledge Infrastructures

Authors : Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A. Zaidan, Alex Hardisty

Interpreting observational data is a fundamental task in the sciences, specifically in earth and environmental science where observational data are increasingly acquired, curated, and published systematically by environmental research infrastructures.

Typically subject to substantial processing, observational data are used by research communities, their research groups and individual scientists, who interpret such primary data for their meaning in the context of research investigations.

The result of interpretation is information—meaningful secondary or derived data—about the observed environment. Research infrastructures and research communities are thus essential to evolving uninterpreted observational data to information. In digital form, the classical bearer of information are the commonly known “(elaborated) data products,” for instance maps.

In such form, meaning is generally implicit e.g., in map colour coding, and thus largely inaccessible to machines. The systematic acquisition, curation, possible publishing and further processing of information gained in observational data interpretation—as machine readable data and their machine readable meaning—is not common practice among environmental research infrastructures.

For a use case in aerosol science, we elucidate these problems and present a Jupyter based prototype infrastructure that exploits a machine learning approach to interpretation and could support a research community in interpreting observational data and, more importantly, in curating and further using resulting information about a studied natural phenomenon.

URL : Curating Scientific Information in Knowledge Infrastructures

DOI : http://doi.org/10.5334/dsj-2018-021

Big Data and Data Science: Opportunities and Challenges of iSchools

Authors : Il-Yeol Song, Yongjun Zhu

Due to the recent explosion of big data, our society has been rapidly going through digital transformation and entering a new world with numerous eye-opening developments. These new trends impact the society and future jobs, and thus student careers.

At the heart of this digital transformation is data science, the discipline that makes sense of big data. With many rapidly emerging digital challenges ahead of us, this article discusses perspectives on iSchools’ opportunities and suggestions in data science education.

We argue that iSchools should empower their students with “information computing” disciplines, which we define as the ability to solve problems and create values, information, and knowledge using tools in application domains.

As specific approaches to enforcing information computing disciplines in data science education, we suggest the three foci of user-based, tool-based, and application-based. These three foci will serve to differentiate the data science education of iSchools from that of computer science or business schools.

We present a layered Data Science Education Framework (DSEF) with building blocks that include the three pillars of data science (people, technology, and data), computational thinking, data-driven paradigms, and data science lifecycles.

Data science courses built on the top of this framework should thus be executed with user-based, tool-based, and application-based approaches.

This framework will help our students think about data science problems from the big picture perspective and foster appropriate problem-solving skills in conjunction with broad perspectives of data science lifecycles. We hope the DSEF discussed in this article will help fellow iSchools in their design of new data science curricula.

URL : Big Data and Data Science: Opportunities and Challenges of iSchools

DOI : https://doi.org/10.1515/jdis-2017-0011

Towards an Ethical Framework for Publishing Twitter Data in Social Research: Taking into Account Users’ Views, Online Context and Algorithmic Estimation

Authors : Matthew L Williams, Pete Burnap, Luke Sloan

New and emerging forms of data, including posts harvested from social media sites such as Twitter, have become part of the sociologist’s data diet. In particular, some researchers see an advantage in the perceived ‘public’ nature of Twitter posts, representing them in publications without seeking informed consent.

While such practice may not be at odds with Twitter’s terms of service, we argue there is a need to interpret these through the lens of social science research methods that imply a more reflexive ethical approach than provided in ‘legal’ accounts of the permissible use of these data in research publications.

To challenge some existing practice in Twitter-based research, this article brings to the fore: (1) views of Twitter users through analysis of online survey data; (2) the effect of context collapse and online disinhibition on the behaviours of users; and (3) the publication of identifiable sensitive classifications derived from algorithms.

URL : Towards an Ethical Framework for Publishing Twitter Data in Social Research: Taking into Account Users’ Views, Online Context and Algorithmic Estimation

DOI : http://dx.doi.org/10.1177%2F0038038517708140

Dataverse 4.0: Defining Data Publishing

Statut

 The research community needs reliable, standard ways to make the data produced by scientific research available to the community, while getting credit as data authors. As a result, a new form of scholarly publication is emerging: data publishing. Data pubishing – or making data long-term accessible, reusable and citable – is more involved than simply providing a link to a data file or posting the data to the researchers web site.

In this paper, we define what is needed for proper data publishing and describe how the open-source Dataverse software helps define, enable and enhance data publishing for all.

URL : http://scholar.harvard.edu/mercecrosas/publications/dataverse-4-defining-data-publishing