On the Reuse of Scientific Data

Authors : Irene V. Pasquetto, Bernadette M. Randles, Christine L. Borgman

While science policy promotes data sharing and open data, these are not ends in themselves. Arguments for data sharing are to reproduce research, to make public assets available to the public, to leverage investments in research, and to advance research and innovation.

To achieve these expected benefits of data sharing, data must actually be reused by others. Data sharing practices, especially motivations and incentives, have received far more study than has data reuse, perhaps because of the array of contested concepts on which reuse rests and the disparate contexts in which it occurs.

Here we explicate concepts of data, sharing, and open data as a means to examine data reuse. We explore distinctions between use and reuse of data.

Lastly we propose six research questions on data reuse worthy of pursuit by the community: How can uses of data be distinguished from reuses? When is reproducibility an essential goal? When is data integration an essential goal? What are the tradeoffs between collecting new data and reusing existing data? How do motivations for data collection influence the ability to reuse data? How do standards and formats for data release influence reuse opportunities?

We conclude by summarizing the implications of these questions for science policy and for investments in data reuse.

URL : On the Reuse of Scientific Data

DOI : http://doi.org/10.5334/dsj-2017-008

Data Reuse as a Prisoner’s Dilemma: the social capital of open science

Author : Bradly Alicea

Participation in Open Data initiatives require two semi-independent actions: the sharing of data produced by a researcher or group, and a consumer of shared data. Consumers of shared data range from people interested in validating the results of a given study to transformers of the data.

These transformers can add value to the dataset by extracting new relationships and information. The relationship between producers and consumers can be modeled in a game-theoretic context, namely by using a Prisoners’ Dilemma (PD) model to better understand potential barriers and benefits of sharing.

In this paper, we will introduce the problem of data sharing, consider assumptions about economic versus social payoffs, and provide simplistic payoff matrices of data sharing.

Several variations on the payoff matrix are given for different institutional scenarios, ranging from the ubiquitous acceptance of Open Science principles to a context where the standard is entirely non-cooperative. Implications for building a CC-BY economy are then discussed in context.

URL : Data Reuse as a Prisoner’s Dilemma: the social capital of open science

DOI : https://doi.org/10.1101/093518

Research Data Reusability: Conceptual Foundations, Barriers and Enabling Technologies

Author : Costantino Thanos

High-throughput scientific instruments are generating massive amounts of data. Today, one of the main challenges faced by researchers is to make the best use of the world’s growing wealth of data. Data (re)usability is becoming a distinct characteristic of modern scientific practice.

By data (re)usability, we mean the ease of using data for legitimate scientific research by one or more communities of research (consumer communities) that is produced by other communities of research (producer communities).

Data (re)usability allows the reanalysis of evidence, reproduction and verification of results, minimizing duplication of effort, and building on the work of others. It has four main dimensions: policy, legal, economic and technological. The paper addresses the technological dimension of data reusability.

The conceptual foundations of data reuse as well as the barriers that hamper data reuse are presented and discussed. The data publication process is proposed as a bridge between the data author and user and the relevant technologies enabling this process are presented.

URL : Research Data Reusability: Conceptual Foundations, Barriers and Enabling Technologies

DOI : http://dx.doi.org/10.3390/publications5010002

Reproducible and reusable research: Are journal data sharing policies meeting the mark?

Author : Nicole A Vasilevsky, Jessica Minnier, Melissa A Haendel, Robin E Champieux

Background

There is wide agreement in the biomedical research community that research data sharing is a primary ingredient for ensuring that science is more transparent and reproducible.

Publishers could play an important role in facilitating and enforcing data sharing; however, many journals have not yet implemented data sharing policies and the requirements vary widely across journals. This study set out to analyze the pervasiveness and quality of data sharing policies in the biomedical literature.

Methods

The online author’s instructions and editorial policies for 318 biomedical journals were manually reviewed to analyze the journal’s data sharing requirements and characteristics.

The data sharing policies were ranked using a rubric to determine if data sharing was required, recommended, required only for omics data, or not addressed at all. The data sharing method and licensing recommendations were examined, as well any mention of reproducibility or similar concepts.

The data was analyzed for patterns relating to publishing volume, Journal Impact Factor, and the publishing model (open access or subscription) of each journal.

Results

11.9% of journals analyzed explicitly stated that data sharing was required as a condition of publication. 9.1% of journals required data sharing, but did not state that it would affect publication decisions. 23.3% of journals had a statement encouraging authors to share their data but did not require it.

There was no mention of data sharing in 31.8% of journals. Impact factors were significantly higher for journals with the strongest data sharing policies compared to all other data sharing mark categories. Open access journals were not more likely to require data sharing than subscription journals.

Discussion

Our study confirmed earlier investigations which observed that only a minority of biomedical journals require data sharing, and a significant association between higher Impact Factors and journals with a data sharing requirement.

Moreover, while 65.7% of the journals in our study that required data sharing addressed the concept of reproducibility, as with earlier investigations, we found that most data sharing policies did not provide specific guidance on the practices that ensure data is maximally available and reusable.

URL : Reproducible and reusable research: Are journal data sharing policies meeting the mark?

DOI : https://peerj.com/articles/3208/

 

Data trajectories: tracking reuse of published data for transitive credit attribution

Author : Paolo Missier

The ability to measure the use and impact of published data sets is key to the success of the open data/open science paradigm. A direct measure of impact would require tracking data (re)use in the wild, which is difficult to achieve.

This is therefore commonly replaced by simpler metrics based on data download and citation counts. In this paper we describe a scenario where it is possible to track the trajectory of a dataset after its publication, and show how this enables the design of accurate models for ascribing credit to data originators.

A Data Trajectory (DT) is a graph that encodes knowledge of how, by whom, and in which context data has been re-used, possibly after several generations. We provide a theoretical model of DTs that is grounded in the W3C PROV data model for provenance, and we show how DTs can be used to automatically propagate a fraction of the credit associated with transitively derived datasets, back to original data contributors.

We also show this model of transitive credit in action by means of a Data Reuse Simulator. In the longer term, our ultimate hope is that credit models based on direct measures of data reuse will provide further incentives to data publication.

We conclude by outlining a research agenda to address the hard questions of creating, collecting, and using DTs systematically across a large number of data reuse instances in the wild.

URL : Data trajectories: tracking reuse of published data for transitive credit attribution

URL : http://dx.doi.org/10.2218/ijdc.v11i1.425

Factors Influencing Research Data Reuse in the Social Sciences : An Exploratory Study

Author : Renata Gonçalves Curty

The development of e-Research infrastructure has enabled data to be shared and accessed more openly. Policy mandates for data sharing have contributed to the increasing availability of research data through data repositories, which create favourable conditions for the re-use of data for purposes not always anticipated by original collectors.

Despite the current efforts to promote transparency and reproducibility in science, datare-use cannot be assumed, nor merely considered a ‘thrifting’ activity where scientists shop around in datarepositories considering only the ease of access to data.

The lack of an integrated view of individual, socialand technological influential factors to intentional and actual data re-use behaviour was the key motivatorfor this study. Interviews with 13 social scientists produced 25 factors that were found to influence theirperceptions and experiences, including both their unsuccessful and successful attempts to re-use data.

These factors were grouped into six theoretical variables: perceived benefits, perceived risks, perceived effort,social influence, facilitating conditions, and perceived re-usability.

These research findings provide an in-depth understanding about the re-use of research data in the context of open science, which can be valuablein terms of theory and practice to help leverage data re-use and make publicly available data moreactionable.

URL : Factors Influencing Research Data Reuse in the Social Sciences : An Exploratory Study

DOI : http://dx.doi.org/10.2218/ijdc.v11i1.401

Discovery and Reuse of Open Datasets: An Exploratory Study

Authors : Sara Mannheimer, Leila Belle Sterman, Susan Borda

Objective

This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories.

Methods

Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric.

The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description.

Results

Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories.

Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers.

The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates.

Conclusions

The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets.

Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

URL : Discovery and Reuse of Open Datasets: An Exploratory Study

DOI : http://dx.doi.org/10.7191/jeslib.2016.1091