Data Sharing: Convert Challenges into Opportunities

Author : Ana Sofia Figueiredo

Initiatives for sharing research data are opportunities to increase the pace of knowledge discovery and scientific progress. The reuse of research data has the potential to avoid the duplication of data sets and to bring new views from multiple analysis of the same data set.

For example, the study of genomic variations associated with cancer profits from the universal collection of such data and helps in selecting the most appropriate therapy for a specific patient. However, data sharing poses challenges to the scientific community.

These challenges are of ethical, cultural, legal, financial, or technical nature. This article reviews the impact that data sharing has in science and society and presents guidelines to improve the efficient sharing of research data.

URL : Data Sharing: Convert Challenges into Opportunities

DOI : https://doi.org/10.3389/fpubh.2017.00327

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

Authors : Mariam Alqasab, Suzanne M. Embury, Sandra de F. Mendes Sampaio

In the era of data science, datasets are shared widely and used for many purposes unforeseen by the original creators of the data. In this context, defects in datasets can have far reaching consequences, spreading from dataset to dataset, and affecting the consumers of data in ways that are hard to predict or quantify.

Some form of waste is often the result. For example, scientists using defective data to propose hypotheses for experimentation may waste their limited wet lab resources chasing the wrong experimental targets. Scarce drug trial resources may be used to test drugs that actually have little chance of giving a cure.

Because of the potential real world costs, database owners care about providing high quality data. Automated curation tools can be used to an extent to discover and correct some forms of defect.

However, in some areas human curation, performed by highly-trained domain experts, is needed to ensure that the data represents our current interpretation of reality accurately.

Human curators are expensive, and there is far more curation work to be done than there are curators available to perform it. Tools and techniques are needed to enable the full value to be obtained from the curation effort currently available.

In this paper,we explore one possible approach to maximising the value obtained from human curators, by automatically extracting information about data defects and corrections from the work that the curators do.

This information is packaged in a source independent form, to allow it to be used by the owners of other databases (for which human curation effort is not available or is insufficient).

This amplifies the efforts of the human curators, allowing their work to be applied to other sources, without requiring any additional effort or change in their processes or tool sets. We show that this approach can discover significant numbers of defects, which can also be found in other sources.

URL : Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

DOI : https://doi.org/10.2218/ijdc.v12i1.495

A review of data sharing statements in observational studies published in the BMJ: A cross-sectional study

Authors : Laura McDonald, Anna Schultze, Alex Simpson, Sophie Graham, Radek Wasiak, Sreeram V. Ramagopalan

In order to understand the current state of data sharing in observational research studies, we reviewed data sharing statements of observational studies published in a general medical journal, the British Medical Journal.

We found that the majority (63%) of observational studies published between 2015 and 2017 included a statement that implied that data used in the study could not be shared. If the findings of our exploratory study are confirmed, room for improvement in the sharing of real-world or observational research data exists.

URL : A review of data sharing statements in observational studies published in the BMJ: A cross-sectional study

DOI : http://dx.doi.org/10.12688/f1000research.12673.2

Versioned data: why it is needed and how it can be achieved (easily and cheaply)

Authors : Daniel S. Falster, Richard G. FitzJohn, Matthew W. Pennell, William K. Cornwell

The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow quick and easy data sharing. So far, however, data publishing models have not accommodated on-going scientific improvements in data: for many problems, datasets continue to grow with time — more records are added, errors fixed, and new data structures are created. In other words, datasets, like scientific knowledge, advance with time.

We therefore suggest that many datasets would be usefully published as a series of versions, with a simple naming system to allow users to perceive the type of change between versions. In this article, we argue for adopting the paradigm and processes for versioned data, analogous to software versioning.

We also introduce a system called Versioned Data Delivery and present tools for creating, archiving, and distributing versioned data easily, quickly, and cheaply. These new tools allow for individual research groups to shift from a static model of data curation to a dynamic and versioned model that more naturally matches the scientific process.

URL : Versioned data: why it is needed and how it can be achieved (easily and cheaply)

DOI : https://doi.org/10.7287/peerj.preprints.3401v1

 

The Evolution, Approval and Implementation of the U.S. Geological Survey Science Data Lifecycle Model

Authors : John L. Faundeen, Vivian B. Hutchison

This paper details how the U.S. Geological Survey (USGS) Community for Data Integration (CDI) Data Management Working Group developed a Science Data Lifecycle Model, and the role the Model plays in shaping agency-wide policies and data management applications.

Starting with an extensive literature review of existing data lifecycle models, representatives from various backgrounds in USGS attended a two-day meeting where the basic elements for the Science Data Lifecycle Model were determined.

Refinements and reviews spanned two years, leading to finalization of the model and documentation in a formal agency publication1.

The Model serves as a critical framework for data management policy, instructional resources, and tools. The Model helps the USGS address both the Office of Science and Technology Policy (OSTP)2 for increased public access to federally funded research, and the Office of Management and Budget (OMB)3 2013 Open Data directives, as the foundation for a series of agency policies related to data management planning, metadata development, data release procedures, and the long-term preservation of data.

Additionally, the agency website devoted to data management instruction and best practices (www2.usgs.gov/datamanagement) is designed around the Model’s structure and concepts. This paper also illustrates how the Model is being used to develop tools for supporting USGS research and data management processes.

URL : http://escholarship.umassmed.edu/jeslib/vol6/iss2/4/

 

Building a Disciplinary, World‐Wide Data Infrastructure

Authors: Françoise Genova, Christophe Arviset, Bridget M. Almas, Laura Bartolo, Daan Broeder, Emily Law, Brian McMahon

Sharing scientific data with the objective of making it discoverable, accessible, reusable, and interoperable requires work and presents challenges being faced at the disciplinary level to define in particular how the data should be formatted and described.

This paper represents the Proceedings of a session held at SciDataCon 2016 (Denver, 12–13 September 2016). It explores the way a range of disciplines, namely materials science, crystallography, astronomy, earth sciences, humanities and linguistics, get organized at the international level to address those challenges. T

he disciplinary culture with respect to data sharing, science drivers, organization, lessons learnt and the elements of the data infrastructure which are or could be shared with others are briefly described. Commonalities and differences are assessed.

Common key elements for success are identified: data sharing should be science driven; defining the disciplinary part of the interdisciplinary standards is mandatory but challenging; sharing of applications should accompany data sharing. Incentives such as journal and funding agency requirements are also similar.

For all, social aspects are more challenging than technological ones. Governance is more diverse, often specific to the discipline organization. Being problem‐driven is also a key factor of success for building bridges to enable interdisciplinary research.

Several international data organizations such as CODATA, RDA and WDS can facilitate the establishment of disciplinary interoperability frameworks. As a spin‐off of the session, a RDA Disciplinary Interoperability Interest Group is proposed to bring together representatives across disciplines to better organize and drive the discussion for prioritizing, harmonizing and efficiently articulating disciplinary needs.

URL : Building a Disciplinary, World‐Wide Data Infrastructure

DOI : http://doi.org/10.5334/dsj-2017-016

 

How to responsibly acknowledge research work in the era of big data and biobanks: ethical aspects of the Bioresource Research Impact Factor (BRIF)

Authors : Heidi Carmen Howard, Deborah Mascalzoni, Laurence Mabile, Gry Houeland, Emmanuelle Rial-Sebbag, Anne Cambon-Thomsen

Currently, a great deal of biomedical research in fields such as epidemiology, clinical trials and genetics is reliant on vast amounts of biological and phenotypic information collected and assembled in biobanks.

While many resources are being invested to ensure that comprehensive and well-organised biobanks are able to provide increased access to, and sharing of biomedical samples and information, many barriers and challenges remain to such responsible and extensive sharing.

Germane to the discussion herein is the barrier to collecting and sharing bioresources related to the lack of proper recognition of researchers and clinicians who developed the bioresource. Indeed, the efforts and resources invested to set up and sustain a bioresource can be enormous and such work should be easily traced and properly recognised.

However, there is currently no such system that systematically and accurately traces and attributes recognition to those doing this work or the bioresource institution itself. As a beginning of a solution to the “recognition problem”, the Bioresource Research Impact Factor/Framework (BRIF) initiative was proposed almost a decade and a half ago and is currently under further development.

With the ultimate aim of increasing awareness and understanding of the BRIF, in this article, we contribute the following: (1) a review of the objectives and functions of the BRIF including the description of two tools that will help in the deployment of the BRIF, the CoBRA (Citation of BioResources in journal Articles) guideline, and the Open Journal of Bioresources (OJB); (2) the results of a small empirical study on stakeholder awareness of the BRIF and (3) a brief analysis of the ethical dimensions of the BRIF which allow it to be a positive contribution to responsible biobanking.

URL : How to responsibly acknowledge research work in the era of big data and biobanks: ethical aspects of the Bioresource Research Impact Factor (BRIF)

Alternative locaton : https://link.springer.com/article/10.1007/s12687-017-0332-6