Versioned data: why it is needed and how it can be achieved (easily and cheaply)

Authors : Daniel S. Falster, Richard G. FitzJohn, Matthew W. Pennell, William K. Cornwell

The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow quick and easy data sharing. So far, however, data publishing models have not accommodated on-going scientific improvements in data: for many problems, datasets continue to grow with time — more records are added, errors fixed, and new data structures are created. In other words, datasets, like scientific knowledge, advance with time.

We therefore suggest that many datasets would be usefully published as a series of versions, with a simple naming system to allow users to perceive the type of change between versions. In this article, we argue for adopting the paradigm and processes for versioned data, analogous to software versioning.

We also introduce a system called Versioned Data Delivery and present tools for creating, archiving, and distributing versioned data easily, quickly, and cheaply. These new tools allow for individual research groups to shift from a static model of data curation to a dynamic and versioned model that more naturally matches the scientific process.

URL : Versioned data: why it is needed and how it can be achieved (easily and cheaply)

DOI : https://doi.org/10.7287/peerj.preprints.3401v1

 

The Evolution, Approval and Implementation of the U.S. Geological Survey Science Data Lifecycle Model

Authors : John L. Faundeen, Vivian B. Hutchison

This paper details how the U.S. Geological Survey (USGS) Community for Data Integration (CDI) Data Management Working Group developed a Science Data Lifecycle Model, and the role the Model plays in shaping agency-wide policies and data management applications.

Starting with an extensive literature review of existing data lifecycle models, representatives from various backgrounds in USGS attended a two-day meeting where the basic elements for the Science Data Lifecycle Model were determined.

Refinements and reviews spanned two years, leading to finalization of the model and documentation in a formal agency publication1.

The Model serves as a critical framework for data management policy, instructional resources, and tools. The Model helps the USGS address both the Office of Science and Technology Policy (OSTP)2 for increased public access to federally funded research, and the Office of Management and Budget (OMB)3 2013 Open Data directives, as the foundation for a series of agency policies related to data management planning, metadata development, data release procedures, and the long-term preservation of data.

Additionally, the agency website devoted to data management instruction and best practices (www2.usgs.gov/datamanagement) is designed around the Model’s structure and concepts. This paper also illustrates how the Model is being used to develop tools for supporting USGS research and data management processes.

URL : http://escholarship.umassmed.edu/jeslib/vol6/iss2/4/

 

Building a Disciplinary, World‐Wide Data Infrastructure

Authors: Françoise Genova, Christophe Arviset, Bridget M. Almas, Laura Bartolo, Daan Broeder, Emily Law, Brian McMahon

Sharing scientific data with the objective of making it discoverable, accessible, reusable, and interoperable requires work and presents challenges being faced at the disciplinary level to define in particular how the data should be formatted and described.

This paper represents the Proceedings of a session held at SciDataCon 2016 (Denver, 12–13 September 2016). It explores the way a range of disciplines, namely materials science, crystallography, astronomy, earth sciences, humanities and linguistics, get organized at the international level to address those challenges. T

he disciplinary culture with respect to data sharing, science drivers, organization, lessons learnt and the elements of the data infrastructure which are or could be shared with others are briefly described. Commonalities and differences are assessed.

Common key elements for success are identified: data sharing should be science driven; defining the disciplinary part of the interdisciplinary standards is mandatory but challenging; sharing of applications should accompany data sharing. Incentives such as journal and funding agency requirements are also similar.

For all, social aspects are more challenging than technological ones. Governance is more diverse, often specific to the discipline organization. Being problem‐driven is also a key factor of success for building bridges to enable interdisciplinary research.

Several international data organizations such as CODATA, RDA and WDS can facilitate the establishment of disciplinary interoperability frameworks. As a spin‐off of the session, a RDA Disciplinary Interoperability Interest Group is proposed to bring together representatives across disciplines to better organize and drive the discussion for prioritizing, harmonizing and efficiently articulating disciplinary needs.

URL : Building a Disciplinary, World‐Wide Data Infrastructure

DOI : http://doi.org/10.5334/dsj-2017-016

 

How to responsibly acknowledge research work in the era of big data and biobanks: ethical aspects of the Bioresource Research Impact Factor (BRIF)

Authors : Heidi Carmen Howard, Deborah Mascalzoni, Laurence Mabile, Gry Houeland, Emmanuelle Rial-Sebbag, Anne Cambon-Thomsen

Currently, a great deal of biomedical research in fields such as epidemiology, clinical trials and genetics is reliant on vast amounts of biological and phenotypic information collected and assembled in biobanks.

While many resources are being invested to ensure that comprehensive and well-organised biobanks are able to provide increased access to, and sharing of biomedical samples and information, many barriers and challenges remain to such responsible and extensive sharing.

Germane to the discussion herein is the barrier to collecting and sharing bioresources related to the lack of proper recognition of researchers and clinicians who developed the bioresource. Indeed, the efforts and resources invested to set up and sustain a bioresource can be enormous and such work should be easily traced and properly recognised.

However, there is currently no such system that systematically and accurately traces and attributes recognition to those doing this work or the bioresource institution itself. As a beginning of a solution to the “recognition problem”, the Bioresource Research Impact Factor/Framework (BRIF) initiative was proposed almost a decade and a half ago and is currently under further development.

With the ultimate aim of increasing awareness and understanding of the BRIF, in this article, we contribute the following: (1) a review of the objectives and functions of the BRIF including the description of two tools that will help in the deployment of the BRIF, the CoBRA (Citation of BioResources in journal Articles) guideline, and the Open Journal of Bioresources (OJB); (2) the results of a small empirical study on stakeholder awareness of the BRIF and (3) a brief analysis of the ethical dimensions of the BRIF which allow it to be a positive contribution to responsible biobanking.

URL : How to responsibly acknowledge research work in the era of big data and biobanks: ethical aspects of the Bioresource Research Impact Factor (BRIF)

Alternative locaton : https://link.springer.com/article/10.1007/s12687-017-0332-6

De l’open data à l’open science : retour réflexif sur les méthodes et pratiques d’une recherche sur les données géographiques

Auteurs/Authors : Nathalie Pinède, Matthieu Noucher, Françoise Gourmelon, Karel Soumagnac-Colin

Nous mobilisons ici l’expérience d’un projet de recherche en cours pour analyser la façon dont les nouveaux terrains d’expérimentations sur le web, modifient les conditions de la pratique scientifique, des objets aux méthodes, de l’open data à l’open science.

La massification des données géographiques disponibles sur le web reconfigure les dynamiques de recherche selon trois axes de transformation : les objets, les méthodes et les pratiques de recherche. Tout d’abord, nous soulignerons comment les enjeux de pouvoir autour de la cartographie se sont déplacés avec l’avènement du web et de l’open data.

Nous développerons ensuite les impacts en matière de méthodologie de recherche dans un contexte d’approche interdisciplinaire. Enfin, nous montrerons comment ce projet de recherche s’inscrit dans une démarche de type open science.

URL : https://rfsic.revues.org/3200

Recommended versus Certified Repositories: Mind the Gap

Authors : Sean Edward Husen, Zoë G. de Wilde, Anita de Waard, Helena Cousijn

Researchers are increasingly required to make research data publicly available in data repositories. Although several organisations propose criteria to recommend and evaluate the quality of data repositories, there is no consensus of what constitutes a good data repository.

In this paper, we investigate, first, which data repositories are recommended by various stakeholders (publishers, funders, and community organizations) and second, which repositories are certified by a number of organisations.

We then compare these two lists of repositories, and the criteria for recommendation and certification. We find that criteria used by organisations recommending and certifying repositories are similar, although the certification criteria are generally more detailed.

We distil the lists of criteria into seven main categories: “Mission”, “Community/Recognition”, “Legal and Contractual Compliance”, “Access/Accessibility”, “Technical Structure/Interface”, “Retrievability” and “Preservation”.

Although the criteria are similar, the lists of repositories that are recommended by the various agencies are very different. Out of all of the recommended repositories, less than 6% obtained certification.

As certification is becoming more important, steps should be taken to decrease this gap between recommended and certified repositories, and ensure that certification standards become applicable, and applied, to the repositories which researchers are currently using.

URL : Recommended versus Certified Repositories: Mind the Gap

DOI: https://doi.org/10.5334/dsj-2017-042

How to share data for collaboration

Authors : Shannon E Ellis, Jeffrey T Leek

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data.

In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.

URL : How to share data for collaboration

DOI : https://doi.org/10.7287/peerj.preprints.3139v1