Measuring and Mapping Data Reuse: Findings From an Interactive Workshop on Data Citation and Metrics for Data Reuse

Author : Lisa Federer

Widely adopted standards for data citation are foundational to efforts to track and quantify data reuse. Without the means to track data reuse and metrics to measure its impact, it is difficult to reward researchers who share high-value data with meaningful credit for their contribution.

Despite initial work on developing guidelines for data citation and metrics, standards have not yet been universally adopted. This article reports on the recommendations collected from a workshop held at the Future of Research Communications and e-Scholarship (FORCE11) 2018 meeting titled Measuring and Mapping Data Reuse: An Interactive Workshop on Metrics for Data.

A range of stakeholders were represented among the participants, including publishers, researchers, funders, repository administrators, librarians, and others.

Collectively, they generated a set of 68 recommendations for specific actions that could be taken by standards and metrics creators; publishers; repositories; funders and institutions; creators of reference management software and citation styles; and researchers, students, and librarians.

These specific, concrete, and actionable recommendations would help facilitate broader adoption of standard citation mechanisms and easier measurement of data reuse.

URL : Measuring and Mapping Data Reuse: Findings From an Interactive Workshop on Data Citation and Metrics for Data Reuse

DOI : https://doi.org/10.1162/99608f92.ccd17b00

The History and Future of Data Citation in Practice

Authors : Mark A. Parsons, Ruth E. Duerr, Matthew B. Jones

In this review, we adopt the definition that ‘Data citation is a reference to data for the purpose of credit attribution and facilitation of access to the data’ (TGDCSP 2013: CIDCR6). Furthermore, access should be enabled for both humans and machines (DCSG 2014).

We use this to discuss how data citation has evolved over the last couple of decades and to highlight issues that need more research and attention.

Data citation is not a new concept, but it has changed and evolved considerably since the beginning of the digital age. Basic practice is now established and slowly but increasingly being implemented.

Nonetheless, critical issues remain. These issues are primarily because we try to address multiple human and computational concerns with a system originally designed in a non-digital world for more limited use cases.

The community is beginning to challenge past assumptions, separate the multiple concerns (credit, access, reference, provenance, impact, etc.), and apply different approaches for different use cases.

URL : The History and Future of Data Citation in Practice

DOI : http://doi.org/10.5334/dsj-2019-052

Reproducible data citations for computational research

Author : Christian Schulz

The general purpose of a scientific publication is the exchange and spread of knowledge. A publication usually reports a scientific result and tries to convince the reader that it is valid.

With an ever-growing number of papers relying on computational methods that make use of large quantities of data and sophisticated statistical modeling techniques, a textual description of the result is often not enough for a publication to be transparent and reproducible.

While there are efforts to encourage sharing of code and data, we currently lack conventions for linking data sources to a computational result that is stated in the main publication text or used to generate a figure or table.

Thus, here I propose a data citation format that allows for an automatic reproduction of all computations. A data citation consists of a descriptor that refers to the functional program code and the input that generated the result.

The input itself may be a set of other data citations, such that all data transformations, from the original data sources to the final result, are transparently expressed by a directed graph.

Functions can be implemented in a variety of programming languages since data sources are expected to be stored in open and standardized text-based file formats.

A publication is then an online file repository consisting of a Hypertext Markup Language (HTML) document and additional data and code source files, together with a summarization of all data sources, similar to a list of references in a bibliography.

URL : https://arxiv.org/abs/1808.07541

A Data Citation Roadmap for Scholarly Data Repositories

Authors : Martin Fenner, Mercè Crosas, Jeffrey S. Grethe, David Kennedy, Henning Hermjakob, Phillippe Rocca-Serra, Gustavo Durand, Robin Berjon, Sebastian Karcher, Maryann Martone, Tim Clark

This article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies.

The roadmap was developed by the Repositories Expert Group, as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE (https://biocaddie.org) program.

The roadmap makes 11 specific recommendations, grouped into three phases of implementation: a) required steps needed to support the Joint Declaration of Data Citation Principles, b) recommended steps that facilitate article/data publication workflows, and c) optional steps that further improve data citation support provided by data repositories.

URL : A Data Citation Roadmap for Scholarly Data Repositories

DOI : https://doi.org/10.1101/097196

 

Theory and Practice of Data Citation

Author : Gianmaria Silvello

Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming “data-intensive”, where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets.

Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has.

The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation.

The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.

URL : https://arxiv.org/abs/1706.07976