The Heritage Data Reuse Charter: from principles to research workflows

Authors : Erzsébet Tóth-Czifra, Laurent Romary

There is a growing need to establish domain-or discipline-specific approaches to research data sharing workflows. A defining feature of data and data workflows in the arts and humanities domain is their dependence on cultural heritage sources hosted and curated in museums, libraries, galleries and archives.

A major difficulty when scholars interact with heritage data is that the nature of the cooperation between researchers and Cultural Heritage Institutions (henceforth CHIs) is often constrained by structural and legal challenges but even more by uncertainties as to the expectations of both parties.

The Heritage Data Reuse Charter aims to address these by designing a common environment that will enable all the relevant actors to work together to connect and improve access to heritage data and make transactions related to the scholarly use of cultural heritage data more visible and transparent.

As a first step, a wide range of stakeholders on the Cultural Heritage and research sector agreed upon a set of generic principles, summarized in the Mission Statement of the Charter, that can serve as a baseline governing the interactions between CHIs, researchers and data centres.

This was followed by a long and thorough validation process related to these principles through surveys 1 and workshops 2. As a second step, we now put forward a questionnaire template tool that helps researchers and CHIs to translate the 6 core principles into specific research project settings.

It contains questions about access to data, provenance information, preferred citation standards, hosting responsibilities etc. on the basis of which the parties can arrive at mutual reuse agreements that could serve as a starting point for a FAIR-by-construction data management, right from the project planning/application phase.

The questionnaire template and the resulting mutual agreements can be flexibly applied to projects of different scale and in platform-independent ways. Institutions can embed them into their own exchange protocols while researchers can add them to their Data Management Plans.

As such, they can show evidence for responsible and fair conduct of cultural heritage data, and fair (but also FAIR) research data management practices that are based on partnership with the holding institution.

URL : https://halshs.archives-ouvertes.fr/halshs-02475692

Resurfacing Historical Scientific Data: A Case Study Involving Fruit Breeding Data

Authors : Shannon L. Farrell, Lois G. Hendrickson, Kristen L. Mastel, Katherine Adina Allen, Julia A. Kelly

Objective

The objective of this paper is to illustrate the importance and complexities of working with historical analog data that exists on university campuses. Using a case study of fruit breeding data, we highlight issues and opportunities for librarians to help preserve and increase access to potentially valuable data sets.

Methods

We worked in conjunction with researchers to inventory, describe, and increase access to a large, 100-year-old data set of analog fruit breeding data. This involved creating a spreadsheet to capture metadata about each data set, identifying data sets at risk for loss, and digitizing select items for deposit in our institutional repository.

Results/Discussion

We illustrate that large amounts of data exist within biological and agricultural sciences departments and labs, and how past practices of data collection, record keeping, storage, and management have hindered data reuse.

We demonstrate that librarians have a role in collaborating with researchers and providing direction in how to preserve analog data and make it available for reuse. This work may provide guidance for other science librarians pursing similar projects.

Conclusions

This case study demonstrates how science librarians can build or strengthen their role in managing and providing access to analog data by combining their data management skills with researchers’ needs to recover and reuse data.

URL : Resurfacing Historical Scientific Data: A Case Study Involving Fruit Breeding Data

DOI : https://doi.org/10.7191/jeslib.2019.1171

Playing Well on the Data FAIRground: Initiatives and Infrastructure in Research Data Management

Authors : Danielle Descoteaux, Chiara Farinelli, Marina Soares e Silva, Anita de Waard

Over the past five years, Elsevier has focused on implementing FAIR and best practices in data management, from data preservation through reuse. In this paper we describe a series of efforts undertaken in this time to support proper data management practices.

In particular, we discuss our journal data policies and their implementation, the current status and future goals for the research data management platform Mendeley Data, and clear and persistent linkages to individual data sets stored on external data repositories from corresponding published papers through partnership with Scholix.

Early analysis of our data policies implementation confirms significant disparities at the subject level regarding data sharing practices, with most uptake within disciplines of Physical Sciences. Future directions at Elsevier include implementing better discoverability of linked data within an article and incorporating research data usage metrics.

URL : Playing Well on the Data FAIRground: Initiatives and Infrastructure in Research Data Management

DOI : https://doi.org/10.1162/dint_a_00020

Public Views on Models for Accessing Genomic and Health Data for Research: Mixed Methods Study

Authors : Kerina H Jones, Helen Daniels, Emma Squires, David V Ford

Background

The literature abounds with increasing numbers of research studies using genomic data in combination with health data (eg, health records and phenotypic and lifestyle data), with great potential for large-scale research and precision medicine.

However, concerns have been raised about social acceptability and risks posed for individuals and their kin. Although there has been public engagement on various aspects of this topic, there is a lack of information about public views on data access models.

Objective

This study aimed to address the lack of information on the social acceptability of access models for reusing genomic data collected for research in conjunction with health data.

Models considered were open web-based access, released externally to researchers, and access within a data safe haven.

Methods

Views were ascertained using a series of 8 public workshops (N=116). The workshops included an explanation of benefits and risks in using genomic data with health data, a facilitated discussion, and an exit questionnaire.

The resulting quantitative data were analyzed using descriptive and inferential statistics, and the qualitative data were analyzed for emerging themes.

Results

Respondents placed a high value on the reuse of genomic data but raised concerns including data misuse, information governance, and discrimination. They showed a preference for giving consent and use of data within a safe haven over external release or open access.

Perceived risks with open access included data being used by unscrupulous parties, with external release included data security, and with safe havens included the need for robust safeguards.

Conclusions: This is the first known study exploring public views of access models for reusing anonymized genomic and health data in research.

It indicated that people are generally amenable but prefer data safe havens because of perceived sensitivities. We recommend that public views be incorporated into guidance on models for the reuse of genomic and health data.

URL : Public Views on Models for Accessing Genomic and Health Data for Research: Mixed Methods Study

DOI : https://doi.org/10.2196/14384

Dataset search: a survey

Authors : Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez, Emilia Kacprzak, Paul Groth

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities.

Google recently beta-released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets.

Here, we survey the state of the art of research and commercial systems and discuss what makes dataset search a field in its own right, with unique challenges and open questions.

We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to tackle these questions as well as immediate next steps that will take the field forward.

URL : Dataset search: a survey

DOI : https://doi.org/10.1007/s00778-019-00564-x

The Definition of Reuse

Authors : Stephanie van de Sandt, Sünje Dallmeier-Tiessen, Artemis Lavasa, Vivien Petras

The ability to reuse research data is now considered a key benefit for the wider research community. Researchers of all disciplines are confronted with the pressure to share their research data so that it can be reused.

The demand for data use and reuse has implications on how we document, publish and share research in the first place, and, perhaps most importantly, it affects how we measure the impact of research, which is commonly a measurement of its use and reuse.

It is surprising that research communities, policy makers, etc. have not clearly defined what use and reuse is yet.

We postulate that a clear definition of use and reuse is needed to establish better metrics for a comprehensive scholarly record of individuals, institutions, organizations, etc.

Hence, this article presents a first definition of reuse of research data. Characteristics of reuse are identified by examining the etymology of the term and the analysis of the current discourse, leading to a range of reuse scenarios that show the complexity of today’s research landscape, which has been moving towards a data-driven approach.

The analysis underlines that there is no reason to distinguish use and reuse. We discuss what that means for possible new metrics that attempt to cover Open Science practices more comprehensively.

We hope that the resulting definition will enable a better and more refined strategy for Open Science.

URL : The Definition of Reuse

DOI : http://doi.org/10.5334/dsj-2019-022

The Time Efficiency Gain in Sharing and Reuse of Research Data

Author: Tessa E. Pronk

Among the frequently stated benefits of sharing research data are time efficiency or increased productivity. The assumption is that reuse or secondary use of research data saves researchers time in not having to produce data for a publication themselves.

This can make science more efficient and productive. However, if there is no reuse, time costs in making data available for reuse will have been made with no return on this investment.

In this paper a mathematical model is used to calculate the break-even point for time spent sharing in a scientific community, versus time gain by reuse. This is done for several scenarios; from simple to complex datasets to share and reuse, and at different sharing rates.

The results indicate that sharing research data can indeed cause an efficiency revenue for the scientific community. However, this is not a given in all modeled scenarios.

The scientific community with the lowest reuse needed to reach a break-even point is one that has few sharing researchers and low time investments for sharing and reuse.

This suggests it would be beneficial to have a critical selection of datasets that are worth the effort to prepare for reuse in other scientific studies. In addition, stimulating reuse of datasets in itself would be beneficial to increase efficiency in scientific communities.

URL : The Time Efficiency Gain in Sharing and Reuse of Research Data

DOI : http://doi.org/10.5334/dsj-2019-010