How Important is Data Curation? Gaps and Opportunities for Academic Libraries

Authors: Lisa R Johnston, Jacob Carlson, Cynthia Hudson-Vitale, Heidi Imker, Wendy Kozlowski, Robert Olendorf, Claire Stewart

INTRODUCTION

Data curation may be an emerging service for academic libraries, but researchers actively “curate” their data in a number of ways—even if terminology may not always align. Building on past userneeds assessments performed via survey and focus groups, the authors sought direct input from researchers on the importance and utilization of specific data curation activities.

METHODS

Between October 21, 2016, and November 18, 2016, the study team held focus groups with 91 participants at six different academic institutions to determine which data curation activities were most important to researchers, which activities were currently underway for their data, and how satisfied they were with the results.

RESULTS

Researchers are actively engaged in a variety of data curation activities, and while they considered most data curation activities to be highly important, a majority of the sample reported dissatisfaction with the current state of data curation at their institution.

DISCUSSION

Our findings demonstrate specific gaps and opportunities for academic libraries to focus their data curation services to more effectively meet researcher needs.

CONCLUSION

Research libraries stand to benefit their users by emphasizing, investing in, and/or heavily promoting the highly valued services that may not currently be in use by many researchers.

URL : How Important is Data Curation? Gaps and Opportunities for Academic Libraries

DOI : http://doi.org/10.7710/2162-3309.2198

Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework

Author : Hagen Peukert

Handling heterogeneous data, subject to minimal costs, can be perceived as a classic management problem. The approach at hand applies established managerial theorizing to the field of data curation.

It is argued, however, that data curation cannot merely be treated as a standard case of applying management theory in a traditional sense. Rather, the practice of curating humanities research data, the specifications and adjustments of the model suggested here reveal an intertwined process, in which knowledge of both strategic management and solid information technology have to be considered.

Thus, suggestions on the strategic positioning of research data, which can be used as an analytical tool to understand the proposed workflow mechanisms, and the definition of workflow modules, which can be flexibly used in designing new standard workflows to configure research data repositories, are put forward.

URL : Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework

DOI : https://doi.org/10.2218/ijdc.v12i2.571

Practices of research data curation in institutional repositories: A qualitative view from repository staff

Authors : Dong Joon Lee, Besiki Stvilia

The importance of managing research data has been emphasized by the government, funding agencies, and scholarly communities. Increased access to research data increases the impact and efficiency of scientific activities and funding.

Thus, many research institutions have established or plan to establish research data curation services as part of their Institutional Repositories (IRs). However, in order to design effective research data curation services in IRs, and to build active research data providers and user communities around those IRs, it is essential to study current data curation practices and provide rich descriptions of the sociotechnical factors and relationships shaping those practices.

Based on 13 interviews with 15 IR staff members from 13 large research universities in the United States, this paper provides a rich, qualitative description of research data curation and use practices in IRs.

In particular, the paper identifies data curation and use activities in IRs, as well as their structures, roles played, skills needed, contradictions and problems present, solutions sought, and workarounds applied.

The paper can inform the development of best practice guides, infrastructure and service templates, as well as education in research data curation in Library and Information Science (LIS) schools.

URL : Practices of research data curation in institutional repositories: A qualitative view from repository staff

DOI : https://doi.org/10.1371/journal.pone.0173987

Setting up a National Research Data Curation Service for Qatar: Challenges and Opportunities

Authors : Arif Shaon, Armin Straube, Krishna Roy Chowdhury

Over the past decade, Qatar has been making considerable progress towards developing a sustainable research culture for the nation. The main driver behind Qatar’s progress in research and innovation is Qatar Foundation for Education, Science, and Community Development (QF), a private, non-profit organization that aims to utilise research as a catalyst for expanding, diversifying and improving the country’s economy, health and environment.

While this has resulted in a significant growth in the number of research publications produced by Qatari researchers in recent years, a nationally co-ordinated approach is needed to address some of the emerging but increasingly important aspects of research data curation, such as management and publication of research data as important outputs, and their long-term digital preservation.

Qatar National Library (QNL), launched in November 2012 under the umbrella of QF, aims to establish itself as a centre of excellence in Qatar for research data management, curation and publishing to address the research data-related needs of Qatari researchers and academics.

This paper describes QNL’s approach towards establishing a national research data curation service for Qatar, highlighting the associated opportunities and key challenges.

URL : Setting up a National Research Data Curation Service for Qatar: Challenges and Opportunities

Alternative location : http://www.ijdc.net/article/view/515

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

Authors : Mariam Alqasab, Suzanne M. Embury, Sandra de F. Mendes Sampaio

In the era of data science, datasets are shared widely and used for many purposes unforeseen by the original creators of the data. In this context, defects in datasets can have far reaching consequences, spreading from dataset to dataset, and affecting the consumers of data in ways that are hard to predict or quantify.

Some form of waste is often the result. For example, scientists using defective data to propose hypotheses for experimentation may waste their limited wet lab resources chasing the wrong experimental targets. Scarce drug trial resources may be used to test drugs that actually have little chance of giving a cure.

Because of the potential real world costs, database owners care about providing high quality data. Automated curation tools can be used to an extent to discover and correct some forms of defect.

However, in some areas human curation, performed by highly-trained domain experts, is needed to ensure that the data represents our current interpretation of reality accurately.

Human curators are expensive, and there is far more curation work to be done than there are curators available to perform it. Tools and techniques are needed to enable the full value to be obtained from the curation effort currently available.

In this paper,we explore one possible approach to maximising the value obtained from human curators, by automatically extracting information about data defects and corrections from the work that the curators do.

This information is packaged in a source independent form, to allow it to be used by the owners of other databases (for which human curation effort is not available or is insufficient).

This amplifies the efforts of the human curators, allowing their work to be applied to other sources, without requiring any additional effort or change in their processes or tool sets. We show that this approach can discover significant numbers of defects, which can also be found in other sources.

URL : Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

DOI : https://doi.org/10.2218/ijdc.v12i1.495

Connecting Data Publication to the Research Workflow: A Preliminary Analysis

Authors : Sünje Dallmeier-Tiessen, Varsha Khodiyar, Fiona Murphy, Amy Nurnberger, Lisa Raymond, Angus Whyte

The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation.

Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society.

Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers.

Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository.

This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process.

We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream.

These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data.

We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.

URL : Connecting Data Publication to the Research Workflow: A Preliminary Analysis

DOI : https://doi.org/10.2218/ijdc.v12i1.533

Rethinking Data Sharing and Human Participant Protection in Social Science Research: Applications from the Qualitative Realm

Authors : Dessi Kirilova, Sebastian Karcher

While data sharing is becoming increasingly common in quantitative social inquiry, qualitative data are rarely shared. One factor inhibiting data sharing is a concern about human participant protections and privacy.

Protecting the confidentiality and safety of research participants is a concern for both quantitative and qualitative researchers, but it raises specific concerns within the epistemic context of qualitative research.

Thus, the applicability of emerging protection models from the quantitative realm must be carefully evaluated for application to the qualitative realm. At the same time, qualitative scholars already employ a variety of strategies for human-participant protection implicitly or informally during the research process.

In this practice paper, we assess available strategies for protecting human participants and how they can be deployed. We describe a spectrum of possible data management options, such as de-identification and applying access controls, including some already employed by the Qualitative Data Repository (QDR) in tandem with its pilot depositors.

Throughout the discussion, we consider the tension between modifying data or restricting access to them, and retaining their analytic value.

We argue that developing explicit guidelines for sharing qualitative data generated through interaction with humans will allow scholars to address privacy concerns and increase the secondary use of their data.

URL : Rethinking Data Sharing and Human Participant Protection in Social Science Research: Applications from the Qualitative Realm

DOI : http://doi.org/10.5334/dsj-2017-043