Dataset search: a survey

Authors : Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez, Emilia Kacprzak, Paul Groth

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities.

Google recently beta-released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets.

Here, we survey the state of the art of research and commercial systems and discuss what makes dataset search a field in its own right, with unique challenges and open questions.

We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to tackle these questions as well as immediate next steps that will take the field forward.

URL : Dataset search: a survey

DOI : https://doi.org/10.1007/s00778-019-00564-x

Data Sharing Practices among Researchers at South African Universities

Authors : Siviwe Bangani, Mathew Moyo

Research data management practices have gained momentum the world over. This is due to increased demands by governments and other funding agencies to have research data archived and shared as widely as possible.

This paper sought to establish the data sharing practices of researchers in South Africa. The study further sought to establish the level of collaboration among researchers in sharing research data at the university level.

The outcomes of the survey will help the researchers to develop appropriate data literacy awareness programmes meant to stimulate growth in data sharing practices for the benefit of research, not only in South Africa, but the world at large.

A survey research method was used to gather data from willing public universities in South Africa. A similar study was conducted in other countries such as the United Kingdom, France and Turkey but the Researchers believe that circumstances in the developed world may differ with the South African research environment, hence the current study.

The major finding of this study was that most researchers preferred to use data produced by others but less keen on sharing their own data.

This study is the first of its kind in South Africa which investigates data sharing practices of researchers from multi-disciplinary fields at the university level and will contribute immensely to the growing body of literature in the area of research data management.

URL : Data Sharing Practices among Researchers at South African Universities

DOI : http://doi.org/10.5334/dsj-2019-028

The Landscape of Rights and Licensing Initiatives for Data Sharing

Authors : Sam Grabus, Jane Greenberg

Over the last twenty years, a wide variety of resources have been developed to address the rights and licensing problems inherent with contemporary data sharing practices.

The landscape of developments is this area is increasingly confusing and difficult to navigate, due to the complexity of intellectual property and ethics issues associated with sharing sensitive data.

This paper seeks to address this challenge, examining the landscape and presenting a Version 1.0 directory of resources. A multi-method study was pursued, with an environmental scan examining 20 resources, resulting in three high-level categories: standards, tools, and community initiatives; and a content analysis revealing the subcategories of rights, licensing, metadata & ontologies.

A timeline confirms a shift in licensing standardization priorities from open data to more nuanced and technologically robust solutions, over time, to accommodate for more sensitive data types.

This paper reports on the research undertaking, and comments on the potential for using license-specific metadata supplements and developing data-centric rights and licensing ontologies.

URL : The Landscape of Rights and Licensing Initiatives for Data Sharing

DOI : http://doi.org/10.5334/dsj-2019-029

Implementing publisher policies that inform, support and encourage authors to share data: two case studies

Authors: Leila Jones, Rebecca Grant, Iain Hrynaszkiewicz

Open research data is one of the key areas in the expanding open scholarship movement. Scholarly journals and publishers find themselves at the heart of the shift towards openness, with recent years seeing an increase in the number of scholarly journals with data-sharing policies aiming to increase transparency and reproducibility of research.

In this article we present two case studies which examine the experiences that two leading academic publishers, Taylor & Francis and Springer Nature, have had in rolling out data-sharing policies.

We illustrate some of the considerations involved in providing consistent policies across journals of many disciplines, reflecting on successes and challenges.

URL : Implementing publisher policies that inform, support and encourage authors to share data: two case studies

DOI : http://doi.org/10.1629/uksg.463

Responsible data sharing in international health research: a systematic review of principles and norms

Authors : Shona Kalkman, Menno Mostert, Christoph Gerlinger, Johannes J. M. van Delden, Ghislaine J. M. W. van Thiel

Background

Large-scale linkage of international clinical datasets could lead to unique insights into disease aetiology and facilitate treatment evaluation and drug development.

Hereto, multi-stakeholder consortia are currently designing several disease-specific translational research platforms to enable international health data sharing.

Despite the recent adoption of the EU General Data Protection Regulation (GDPR), the procedures for how to govern responsible data sharing in such projects are not at all spelled out yet. In search of a first, basic outline of an ethical governance framework, we set out to explore relevant ethical principles and norms.

Methods

We performed a systematic review of literature and ethical guidelines for principles and norms pertaining to data sharing for international health research.

Results

We observed an abundance of principles and norms with considerable convergence at the aggregate level of four overarching themes: societal benefits and value; distribution of risks, benefits and burdens; respect for individuals and groups; and public trust and engagement.

However, at the level of principles and norms we identified substantial variation in the phrasing and level of detail, the number and content of norms considered necessary to protect a principle, and the contextual approaches in which principles and norms are used.

Conclusions

While providing some helpful leads for further work on a coherent governance framework for data sharing, the current collection of principles and norms prompts important questions about how to streamline terminology regarding de-identification and how to harmonise the identified principles and norms into a coherent governance framework that promotes data sharing while securing public trust.

URL : Responsible data sharing in international health research: a systematic review of principles and norms

DOI : https://doi.org/10.1186/s12910-019-0359-9

The Time Efficiency Gain in Sharing and Reuse of Research Data

Author: Tessa E. Pronk

Among the frequently stated benefits of sharing research data are time efficiency or increased productivity. The assumption is that reuse or secondary use of research data saves researchers time in not having to produce data for a publication themselves.

This can make science more efficient and productive. However, if there is no reuse, time costs in making data available for reuse will have been made with no return on this investment.

In this paper a mathematical model is used to calculate the break-even point for time spent sharing in a scientific community, versus time gain by reuse. This is done for several scenarios; from simple to complex datasets to share and reuse, and at different sharing rates.

The results indicate that sharing research data can indeed cause an efficiency revenue for the scientific community. However, this is not a given in all modeled scenarios.

The scientific community with the lowest reuse needed to reach a break-even point is one that has few sharing researchers and low time investments for sharing and reuse.

This suggests it would be beneficial to have a critical selection of datasets that are worth the effort to prepare for reuse in other scientific studies. In addition, stimulating reuse of datasets in itself would be beneficial to increase efficiency in scientific communities.

URL : The Time Efficiency Gain in Sharing and Reuse of Research Data

DOI : http://doi.org/10.5334/dsj-2019-010

Research data management in the French National Research Center (CNRS)

Authors : Joachim Schöpfel, Coline Ferrant, Francis Andre, Renaud Fabre

Purpose

The purpose of this paper is to present empirical evidence on the opinion and behaviour of French scientists (senior management level) regarding research data management (RDM).

Design/methodology/approach

The results are part of a nationwide survey on scientific information and documentation with 432 directors of French public research laboratories conducted by the French Research Center CNRS in 2014.

Findings

The paper presents empirical results about data production (types), management (human resources, IT, funding, and standards), data sharing and related needs, and highlights significant disciplinary differences.

Also, it appears that RDM and data sharing is not directly correlated with the commitment to open access. Regarding the FAIR data principles, the paper reveals that 68 per cent of all laboratory directors affirm that their data production and management is compliant with at least one of the FAIR principles.

But only 26 per cent are compliant with at least three principles, and less than 7 per cent are compliant with all four FAIR criteria, with laboratories in nuclear physics, SSH and earth sciences and astronomy being in advance of other disciplines, especially concerning the findability and the availability of their data output.

The paper concludes with comments about research data service development and recommendations for an institutional RDM policy.

Originality/value

For the first time, a nationwide survey was conducted with the senior research management level from all scientific disciplines. Surveys on RDM usually assess individual data behaviours, skills and needs. This survey is different insofar as it addresses institutional and collective data practice.

The respondents did not report on their own data behaviours and attitudes but were asked to provide information about their laboratory. The response rate was high (>30 per cent), and the results provide good insight into the real support and uptake of RDM by senior research managers who provide both models (examples for good practice) and opinion leadership.

URL : https://hal.univ-lille3.fr/hal-01728541/