Étiquette : data science

Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

Auteur de l’article Par Hans Dillaerts
Date de l’article 23 septembre 2019

Authors : Kevin M. Mendez, Leighton Pritchard, Stacey N. Reinke, David I. Broadhurst

Background

A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility.

The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases.

Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work.

To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike.

Aim of Review

To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science.

Key Scientific Concepts of Review

This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.

URL : Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

DOI : https://doi.org/10.1007/s11306-019-1588-0

Étiquettes data science, David I. Broadhurst, FAIR Data, Kevin M. Mendez, Leighton Pritchard, open access, reproducibility of research, Stacey N. Reinke

The principles of tomorrow’s university

Auteur de l’article Par Hans Dillaerts
Date de l’article 13 décembre 2018

Authors : Daniel S. Katz, Gabrielle Allen, Lorena A. Barba, Devin R. Berg, Holly Bik, Carl Boettiger, Christine L. Borgman, C. Titus Brown, Stuart Buck, Randy Burd, Anita de Waard, Martin Paul Eve, Brian E. Granger, Josh Greenberg, Adina Howe, Bill Howe, May Khanna, Timothy L. Killeen, Matthew Mayernik, Erin McKiernan, Chris Mentzel, Nirav Merchant, Kyle E. Niemeyer, Laura Noren, Sarah M. Nusser, Daniel A. Reed, Edward Seidel, MacKenzie Smith, Jeffrey R. Spies, Matt Turk, John D. Van Horn, Jay Walsh

In the 21st Century, research is increasingly data- and computation-driven. Researchers, funders, and the larger community today emphasize the traits of openness and reproducibility.

In March 2017, 13 mostly early-career research leaders who are building their careers around these traits came together with ten university leaders (presidents, vice presidents, and vice provosts), representatives from four funding agencies, and eleven organizers and other stakeholders in an NIH- and NSF-funded one-day, invitation-only workshop titled « Imagining Tomorrow’s University. »

Workshop attendees were charged with launching a new dialog around open research – the current status, opportunities for advancement, and challenges that limit sharing.

The workshop examined how the internet-enabled research world has changed, and how universities need to change to adapt commensurately, aiming to understand how universities can and should make themselves competitive and attract the best students, staff, and faculty in this new world.

During the workshop, the participants re-imagined scholarship, education, and institutions for an open, networked era, to uncover new opportunities for universities to create value and serve society.

They expressed the results of these deliberations as a set of 22 principles of tomorrow’s university across six areas: credit and attribution, communities, outreach and engagement, education, preservation and reproducibility, and technologies.

Activities that follow on from workshop results take one of three forms. First, since the workshop, a number of workshop authors have further developed and published their white papers to make their reflections and recommendations more concrete.

These authors are also conducting efforts to implement these ideas, and to make changes in the university system.

Second, we plan to organise a follow-up workshop that focuses on how these principles could be implemented.

Third, we believe that the outcomes of this workshop support and are connected with recent theoretical work on the position and future of open knowledge institutions.

URL : The principles of tomorrow’s university

DOI : https://doi.org/10.12688/f1000research.17425.1

Curating Scientific Information in Knowledge Infrastructures

Auteur de l’article Par Hans Dillaerts
Date de l’article 29 septembre 2018

Authors : Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A. Zaidan, Alex Hardisty

Interpreting observational data is a fundamental task in the sciences, specifically in earth and environmental science where observational data are increasingly acquired, curated, and published systematically by environmental research infrastructures.

Typically subject to substantial processing, observational data are used by research communities, their research groups and individual scientists, who interpret such primary data for their meaning in the context of research investigations.

The result of interpretation is information—meaningful secondary or derived data—about the observed environment. Research infrastructures and research communities are thus essential to evolving uninterpreted observational data to information. In digital form, the classical bearer of information are the commonly known “(elaborated) data products,” for instance maps.

In such form, meaning is generally implicit e.g., in map colour coding, and thus largely inaccessible to machines. The systematic acquisition, curation, possible publishing and further processing of information gained in observational data interpretation—as machine readable data and their machine readable meaning—is not common practice among environmental research infrastructures.

For a use case in aerosol science, we elucidate these problems and present a Jupyter based prototype infrastructure that exploits a machine learning approach to interpretation and could support a research community in interpreting observational data and, more importantly, in curating and further using resulting information about a studied natural phenomenon.

URL : Curating Scientific Information in Knowledge Infrastructures

DOI : http://doi.org/10.5334/dsj-2018-021

Étiquettes Alex Hardisty, data curation, data science, data use, Linked Data, Markus Fiebig, Markus Stocker, Martha A. Zaidan, Pauli Paasonen

Big Data and Data Science: Opportunities and Challenges of iSchools

Auteur de l’article Par Hans Dillaerts
Date de l’article 2 décembre 2017

Authors : Il-Yeol Song, Yongjun Zhu

Due to the recent explosion of big data, our society has been rapidly going through digital transformation and entering a new world with numerous eye-opening developments. These new trends impact the society and future jobs, and thus student careers.

At the heart of this digital transformation is data science, the discipline that makes sense of big data. With many rapidly emerging digital challenges ahead of us, this article discusses perspectives on iSchools’ opportunities and suggestions in data science education.

We argue that iSchools should empower their students with “information computing” disciplines, which we define as the ability to solve problems and create values, information, and knowledge using tools in application domains.

As specific approaches to enforcing information computing disciplines in data science education, we suggest the three foci of user-based, tool-based, and application-based. These three foci will serve to differentiate the data science education of iSchools from that of computer science or business schools.

We present a layered Data Science Education Framework (DSEF) with building blocks that include the three pillars of data science (people, technology, and data), computational thinking, data-driven paradigms, and data science lifecycles.

Data science courses built on the top of this framework should thus be executed with user-based, tool-based, and application-based approaches.

This framework will help our students think about data science problems from the big picture perspective and foster appropriate problem-solving skills in conjunction with broad perspectives of data science lifecycles. We hope the DSEF discussed in this article will help fellow iSchools in their design of new data science curricula.

URL : Big Data and Data Science: Opportunities and Challenges of iSchools

DOI : https://doi.org/10.1515/jdis-2017-0011

Étiquettes big data, data science, Il-Yeol Song, Yongjun Zhu

Big Metadata, Smart Metadata, and Metadata Capital: Toward Greater Synergy Between Data Science and Metadata

Auteur de l’article Par Hans Dillaerts
Date de l’article 2 décembre 2017

Author : Jane Greenberg

Purpose

The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research. This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science.

Design/methodology/approach

This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science.

Findings

The “utilitarian nature” and “historical and traditional views” of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part of a metadata lingua franca to help frame research in the data science research space.

Research limitations

There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore.

Practical implications

The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem.

Originality/value

Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.

URL : Big Metadata, Smart Metadata, and Metadata Capital: Toward Greater Synergy Between Data Science and Metadata

DOI : https://doi.org/10.1515/jdis-2017-0012

Étiquettes data science, Jane Greenberg, Metadata

Dataverse 4.0: Defining Data Publishing

Auteur de l’article Par Hans Dillaerts
Date de l’article 27 août 2015

The research community needs reliable, standard ways to make the data produced by scientific research available to the community, while getting credit as data authors. As a result, a new form of scholarly publication is emerging: data publishing. Data pubishing – or making data long-term accessible, reusable and citable – is more involved than simply providing a link to a data file or posting the data to the researchers web site.

In this paper, we define what is needed for proper data publishing and describe how the open-source Dataverse software helps define, enable and enhance data publishing for all.

URL : http://scholar.harvard.edu/mercecrosas/publications/dataverse-4-defining-data-publishing

Étiquettes Data Publishing, data science, scientific communication