Versioning Data Is About More than Revisions: A Conceptual Framework and Proposed Principles

Authors : Jens Klump, Lesley Wyborn, Mingfang Wu, Julia Martin, Robert R. Downs, Ari Asmi

A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (e.g., as part of a time series), etc.

In addition, datasets might be bundled into collections, distributed in different encodings or mirrored onto different platforms. All these differences between versions of datasets need to be understood by researchers who want to cite the exact version of the dataset that was used to underpin their research.

Failing to do so reduces the reproducibility of research results. Ambiguous identification of datasets also impacts researchers and data centres who are unable to gain recognition and credit for their contributions to the collection, creation, curation and publication of individual datasets.

Although the means to identify datasets using persistent identifiers have been in place for more than a decade, systematic data versioning practices are currently not available. In this work, we analysed 39 use cases and current practices of data versioning across 33 organisations.

We noticed that the term ‘version’ was used in a very general sense, extending beyond the more common understanding of ‘version’ to refer primarily to revisions and replacements. Using concepts developed in software versioning and the Functional Requirements for Bibliographic Records (FRBR) as a conceptual framework, we developed six foundational principles for versioning of datasets: Revision, Release, Granularity, Manifestation, Provenance and Citation.

These six principles provide a high-level framework for guiding the consistent practice of data versioning and can also serve as guidance for data centres or data providers when setting up their own data revision and version protocols and procedures.

URL : Versioning Data Is About More than Revisions: A Conceptual Framework and Proposed Principles


Analytic reproducibility in articles receiving open data badges at the journal Psychological Science: an observational study

Authors : Tom E. Hardwicke, Manuel Bohn, Kyle MacDonald, Emily Hembacher, Michèle B. Nuijten, Benjamin N. Peloquin, Benjamin E. deMayo, Bria Long, Erica J. Yoon, Michael C. Frank

For any scientific report, repeating the original analyses upon the original data should yield the original outcomes. We evaluated analytic reproducibility in 25 Psychological Science articles awarded open data badges between 2014 and 2015.

Initially, 16 (64%, 95% confidence interval [43,81]) articles contained at least one ‘major numerical discrepancy’ (>10% difference) prompting us to request input from original authors.

Ultimately, target values were reproducible without author involvement for 9 (36% [20,59]) articles; reproducible with author involvement for 6 (24% [8,47]) articles; not fully reproducible with no substantive author response for 3 (12% [0,35]) articles; and not fully reproducible despite author involvement for 7 (28% [12,51]) articles.

Overall, 37 major numerical discrepancies remained out of 789 checked values (5% [3,6]), but original conclusions did not appear affected.

Non-reproducibility was primarily caused by unclear reporting of analytic procedures. These results highlight that open data alone is not sufficient to ensure analytic reproducibility.

URL : Analytic reproducibility in articles receiving open data badges at the journal Psychological Science: an observational study


Questionable Research Practices and Open Science in Quantitative Criminology

Authors : Jason Chinn, Justin Pickett, Simine Vazire, Alex Holcombe


Questionable research practices (QRPs) lead to incorrect research results and contribute to irreproducibility in science. Researchers and institutions have proposed open science practices (OSPs) to improve the detectability of QRPs and the credibility of science. We examine the prevalence of QRPs and OSPs in criminology, and researchers’ opinions of those practices.


We administered an anonymous survey to authors of articles published in criminology journals. Respondents self-reported their own use of 10 QRPs and 5 OSPs. They also estimated the prevalence of use by others, and reported their attitudes toward the practices.


QRPs and OSPs are both common in quantitative criminology, about as common as they are in other fields. Criminologists who responded to our survey support using QRPs in some circumstances, but are even more supportive of using OSPs.

We did not detect a significant relationship between methodological training and either QRP or OSP use.

Support for QRPs is negatively and significantly associated with support for OSPs. Perceived prevalence estimates for some practices resembled a uniform distribution, suggesting criminologists have little knowledge of the proportion of researchers that engage in certain questionable practices.


Most quantitative criminologists in our sample use QRPs, and many use multiple QRPs. The substantial prevalence of QRPs raises questions about the validity and reproducibility of published criminological research.

We found promising levels of OSP use, albeit at levels lagging what researchers endorse. The findings thus suggest that additional reforms are needed to decrease QRP use and increase the use of OSPs.

URL : Questionable Research Practices and Open Science in Quantitative Criminology


Reading the fine print: A review and analysis of business journals’ data sharing policies

Authors : Brianne Dosch, Tyler Martindale

Business librarians offer many data services to their researchers. These services are often focused more on discovery, visualization, and analysis than general data management. But, with the replication crisis facing many business disciplines, there is a need for business librarians to offer more data sharing and general data management support to their researchers.

To find evidence of this data need, 146 business journal’s data sharing policies were reviewed and analyzed to uncover meaningful trends in business research. Results of the study indicate data sharing is not mandated by business journals.

However, data sharing is often encouraged and recommended. This journal policy content analysis provides evidence that business researchers have opportunities to share their research data, and with the right data management support, business librarians can play a significant role in improving the data sharing behaviors of business researchers.


Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015

Authors : Colin F. Camerer, Anna Dreber, Felix Holzmeister, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, Gideon Nave, Brian Nosek, Thomas Pfeiffer, Adam Altmejd, Nick Buttrick, Taizan Chan, Yiling Chen, Eskil Forsell, Anup Gampa, Emma Heikensten, Lily Hummer, Taisuke Imai, Siri Isaksson, Dylan Manfredi, Julia Rose, Eric-Jan Wagenmakers, Hang Wu

Being able to replicate scientific findings is crucial for scientific progress. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 2015.

The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies.

We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators.

Consistent with these results, the estimated true positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility.

Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.

URL : Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015


Reflections on the Future of Research Curation and Research Reproducibility

Authors : John Baillieul, Gerry Grenier, Gianluca Setti

In the years since the launch of the World Wide Web in 1993, there have been profoundly transformative changes to the entire concept of publishing—exceeding all the previous combined technical advances of the centuries following the introduction of movable type in medieval Asia around the year 10001 and the subsequent large-scale commercialization of printing several centuries later by J. Gutenberg (circa 1440).

Periodicals in print—from daily newspapers to scholarly journals—are now quickly disappearing, never to return, and while no publishing sector has been unaffected, many scholarly journals are almost unrecognizable in comparison with their counterparts of two decades ago.

To say that digital delivery of the written word is fundamentally different is a huge understatement. Online publishing permits inclusion of multimedia and interactive content that add new dimensions to what had been available in print-only renderings.

As of this writing, the IEEE portfolio of journal titles comprises 59 online only2 (31%) and 132 that are published in both print and online. The migration from print to online is more stark than these numbers indicate because of the 132 periodicals that are both print and online, the print runs are now quite small and continue to decline.

In short, most readers prefer to have their subscriptions fulfilled by digital renderings only.


Make researchers revisit past publications to improve reproducibility

Authors : Clare Fiala, Eleftherios P. Diamandis

Scientific irreproducibility is a major issue that has recently increased attention from publishers, authors, funders and other players in the scientific arena. Published literature suggests that 50-80% of all science performed is irreproducible. While various solutions to this problem have been proposed, none of them are quick and/or cheap.

Here, we propose one way of reducing scientific irreproducibility by asking authors to revisit their previous publications and provide a commentary after five years. We believe that this measure will alert authors not to over sell their results and will help with better planning and execution of their experiments.

We invite scientific journals to adapt this proposal immediately as a prerequisite for publishing.

URL : Make researchers revisit past publications to improve reproducibility