The Definition of Reuse

Authors : Stephanie van de Sandt, Sünje Dallmeier-Tiessen, Artemis Lavasa, Vivien Petras

The ability to reuse research data is now considered a key benefit for the wider research community. Researchers of all disciplines are confronted with the pressure to share their research data so that it can be reused.

The demand for data use and reuse has implications on how we document, publish and share research in the first place, and, perhaps most importantly, it affects how we measure the impact of research, which is commonly a measurement of its use and reuse.

It is surprising that research communities, policy makers, etc. have not clearly defined what use and reuse is yet.

We postulate that a clear definition of use and reuse is needed to establish better metrics for a comprehensive scholarly record of individuals, institutions, organizations, etc.

Hence, this article presents a first definition of reuse of research data. Characteristics of reuse are identified by examining the etymology of the term and the analysis of the current discourse, leading to a range of reuse scenarios that show the complexity of today’s research landscape, which has been moving towards a data-driven approach.

The analysis underlines that there is no reason to distinguish use and reuse. We discuss what that means for possible new metrics that attempt to cover Open Science practices more comprehensively.

We hope that the resulting definition will enable a better and more refined strategy for Open Science.

URL : The Definition of Reuse

DOI : http://doi.org/10.5334/dsj-2019-022

The Time Efficiency Gain in Sharing and Reuse of Research Data

Author: Tessa E. Pronk

Among the frequently stated benefits of sharing research data are time efficiency or increased productivity. The assumption is that reuse or secondary use of research data saves researchers time in not having to produce data for a publication themselves.

This can make science more efficient and productive. However, if there is no reuse, time costs in making data available for reuse will have been made with no return on this investment.

In this paper a mathematical model is used to calculate the break-even point for time spent sharing in a scientific community, versus time gain by reuse. This is done for several scenarios; from simple to complex datasets to share and reuse, and at different sharing rates.

The results indicate that sharing research data can indeed cause an efficiency revenue for the scientific community. However, this is not a given in all modeled scenarios.

The scientific community with the lowest reuse needed to reach a break-even point is one that has few sharing researchers and low time investments for sharing and reuse.

This suggests it would be beneficial to have a critical selection of datasets that are worth the effort to prepare for reuse in other scientific studies. In addition, stimulating reuse of datasets in itself would be beneficial to increase efficiency in scientific communities.

URL : The Time Efficiency Gain in Sharing and Reuse of Research Data

DOI : http://doi.org/10.5334/dsj-2019-010

Data Sustainability and Reuse Pathways of Natural Resources and Environmental Scientists

Author : Yi Shen

This paper presents a multifarious examination of natural resources and environmental scientists’ adventures navigating the policy change towards open access and cultural shift in data management, sharing, and reuse.

Situated in the institutional context of Virginia Tech, a focus group and multiple individual interviews were conducted exploring the domain scientists’ all-around experiences, performances, and perspectives on their collection, adoption, integration, preservation, and management of data.

The results reveal the scientists’ struggles, concerns, and barriers encountered, as well as their shared values, beliefs, passions, and aspirations when working with data. Based on these findings, this study provides suggestions on data modeling and knowledge representation strategies to support the long-term viability, stewardship, accessibility, and sustainability of scientific data.

It also discusses the art of curation as creative scholarship and new opportunities for data librarians and information professionals to mobilize the data revolution.

URL : https://arxiv.org/abs/1803.01788

DataMed – an open source discovery index for finding biomedical datasets

Authors : Xiaoling Chen, Anupama E Gururaj, Burak Ozyurt, Ruiling Liu, Ergin Soysal, Trevor Cohen, Firat Tiryaki, Yueling Li, Nansu Zong, Min Jiang, Deevakar Rogith, Mandana Salimi, Hyeon-eui Kim, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Claudiu Farcas, Todd Johnson, Ron Margolis, George Alter, Susanna-Assunta Sansone, Ian M Fore, Lucila Ohno-Machado, Jeffrey S Grethe, Hua Xu

Objective

Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain.

Materials and Methods

DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium.

It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries.

In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine.

Results and Conclusion

Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services.

Currently, we have made the DataMed system publically available as an open source package for the biomedical community.

DOI : https://doi.org/10.1093/jamia/ocx121

 

Attitudes and norms affecting scientists’ data reuse

Authors : Renata Gonçalves Curty, Kevin Crowston, Alison Specht, Bruce W. Grant, Elizabeth D. Dalton

The value of sharing scientific research data is widely appreciated, but factors that hinder or prompt the reuse of data remain poorly understood. Using the Theory of Reasoned Action, we test the relationship between the beliefs and attitudes of scientists towards data reuse, and their self-reported data reuse behaviour.

To do so, we used existing responses to selected questions from a worldwide survey of scientists developed and administered by the DataONE Usability and Assessment Working Group (thus practicing data reuse ourselves).

Results show that the perceived efficacy and efficiency of data reuse are strong predictors of reuse behaviour, and that the perceived importance of data reuse corresponds to greater reuse. Expressed lack of trust in existing data and perceived norms against data reuse were not found to be major impediments for reuse contrary to our expectations.

We found that reported use of models and remotely-sensed data was associated with greater reuse. The results suggest that data reuse would be encouraged and normalized by demonstration of its value.

We offer some theoretical and practical suggestions that could help to legitimize investment and policies in favor of data sharing.

URL : Attitudes and norms affecting scientists’ data reuse

DOI : https://doi.org/10.1371/journal.pone.0189288

Versioned data: why it is needed and how it can be achieved (easily and cheaply)

Authors : Daniel S. Falster, Richard G. FitzJohn, Matthew W. Pennell, William K. Cornwell

The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow quick and easy data sharing. So far, however, data publishing models have not accommodated on-going scientific improvements in data: for many problems, datasets continue to grow with time — more records are added, errors fixed, and new data structures are created. In other words, datasets, like scientific knowledge, advance with time.

We therefore suggest that many datasets would be usefully published as a series of versions, with a simple naming system to allow users to perceive the type of change between versions. In this article, we argue for adopting the paradigm and processes for versioned data, analogous to software versioning.

We also introduce a system called Versioned Data Delivery and present tools for creating, archiving, and distributing versioned data easily, quickly, and cheaply. These new tools allow for individual research groups to shift from a static model of data curation to a dynamic and versioned model that more naturally matches the scientific process.

URL : Versioned data: why it is needed and how it can be achieved (easily and cheaply)

DOI : https://doi.org/10.7287/peerj.preprints.3401v1

 

What do data curators care about? Data quality, user trust, and the data reuse plan

Author : Frank Andreas Sposito

Data curation is often defined as the practice of maintaining, preserving, and enhancing research data for long-term value and reusability. The role of data reuse in the data curation lifecycle is critical: increased reuse is the core justification for the often sizable expenditures necessary to build data management infrastructures and user services.

Yet recent studies have shown that data are being shared and reused through open data repositories at much lower levels than expected. These studies underscore a fundamental and often overlooked challenge in research data management that invites deeper examination of the roles and responsibilities of data curators.

This presentation will identify key barriers to data reuse, data quality and user trust, and propose a framework for implementing reuser-centric strategies to increase data reuse.

Using the concept of a “data reuse plan” it will highlight repository-based approaches to improve data quality and user trust, and address critical areas for innovation for data curators working in the absence of repository support.

URL : What do data curators care about? Data quality, user trust, and the data reuse plan

Alternative location : http://library.ifla.org/id/eprint/1797