Lost or found? Discovering data needed for research

Authors : Kathleen Gregory, Paul Groth, Andrea Scharnhorst, Sally Wyatt

Finding or discovering data is a necessary precursor to being able to reuse data, although relatively little large-scale empirical evidence exists about how researchers discover, make sense of and (re)use data for research.

This study presents evidence from the largest known survey investigating how researchers discover and use data that they do not create themselves.

We examine the data needs and discovery strategies of respondents, propose a typology for data (re)use and probe the role of social interactions and other research practices in data discovery, with the aim of informing the design of community-centric solutions and policies.

URL : https://arxiv.org/abs/1909.00464

Dataset search: a survey

Authors : Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez, Emilia Kacprzak, Paul Groth

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities.

Google recently beta-released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets.

Here, we survey the state of the art of research and commercial systems and discuss what makes dataset search a field in its own right, with unique challenges and open questions.

We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to tackle these questions as well as immediate next steps that will take the field forward.

URL : Dataset search: a survey

DOI : https://doi.org/10.1007/s00778-019-00564-x

Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines

Authors : Kathleen Gregory, Paul Groth, Helena Cousijn, Andrea Scharnhorst, Sally Wyatt

A cross‐disciplinary examination of the user behaviors involved in seeking and evaluating data is surprisingly absent from the research data discussion. This review explores the data retrieval literature to identify commonalities in how users search for and evaluate observational research data in selected disciplines.

Two analytical frameworks, rooted in information retrieval and science and technology studies, are used to identify key similarities in practices as a first step toward developing a model describing data retrieval.

URL : Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines

DOI : https://doi.org/10.1002/asi.24165

Understanding Data Retrieval Practices: A Social Informatics Perspective

Authors : Kathleen Gregory, Helena Cousijn, Paul Groth, Andrea Scharnhorst, Sally Wyatt

Open research data are heralded as having the potential to increase effectiveness, productivity, and reproducibility in science, but little is known about the actual practices involved in data search and retrieval.

The socio-technical problem of locating data for (re)use is often reduced to the technological dimension of designing data search systems. In this article, we explore how a social informatics perspective can help to better analyze the current academic discourse about data retrieval as well as to study user practices and behaviors.

We employ two methods in our analysis – bibliometrics and interviews with data seekers – and conclude with a discussion of the implications of our findings for designing data discovery systems.

URL : https://arxiv.org/abs/1801.04971

Searching Data: A Review of Observational Data Retrieval Practices

Authors : Kathleen Gregory, Paul Groth, Helena Cousijn, Andrea Scharnhorst, Sally Wyatt

A cross-disciplinary examination of the user behaviours involved in seeking and evaluating data is surprisingly absent from the research data discussion. This review explores the data retrieval literature to identify commonalities in how users search for and evaluate observational research data.

Two analytical frameworks rooted in information retrieval and science technology studies are used to identify key similarities in practices as a first step toward developing a model describing data retrieval.

URL : https://arxiv.org/abs/1707.06937