Versioned data: why it is needed and how it can be achieved (easily and cheaply)

Authors : Daniel S. Falster, Richard G. FitzJohn, Matthew W. Pennell, William K. Cornwell

The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow quick and easy data sharing. So far, however, data publishing models have not accommodated on-going scientific improvements in data: for many problems, datasets continue to grow with time — more records are added, errors fixed, and new data structures are created. In other words, datasets, like scientific knowledge, advance with time.

We therefore suggest that many datasets would be usefully published as a series of versions, with a simple naming system to allow users to perceive the type of change between versions. In this article, we argue for adopting the paradigm and processes for versioned data, analogous to software versioning.

We also introduce a system called Versioned Data Delivery and present tools for creating, archiving, and distributing versioned data easily, quickly, and cheaply. These new tools allow for individual research groups to shift from a static model of data curation to a dynamic and versioned model that more naturally matches the scientific process.

URL : Versioned data: why it is needed and how it can be achieved (easily and cheaply)



Are Scientific Data Repositories Coping with Research Data Publishing?

Research data publishing is intended as the release of research data to make it possible for practitioners to (re)use them according to “open science” dynamics. There are three main actors called to deal with research data publishing practices: researchers, publishers, and data repositories.

This study analyses the solutions offered by generalist scientific data repositories, i.e., repositories supporting the deposition of any type of research data. These repositories cannot make any assumption on the application domain.

They are actually called to face with the almost open ended typologies of data used in science. The current practices promoted by such repositories are analysed with respect to eight key aspects of data publishing, i.e., dataset formatting, documentation, licensing, publication costs, validation, availability, discovery and access, and citation.

From this analysis it emerges that these repositories implement well consolidated practices and pragmatic solutions for literature repositories.

These practices and solutions can not totally meet the needs of management and use of datasets resources, especially in a context where rapid technological changes continuously open new exploitation prospects.

URL : Are Scientific Data Repositories Coping with Research Data Publishing?