Curating Scientific Information in Knowledge Infrastructures

Authors : Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A. Zaidan, Alex Hardisty

Interpreting observational data is a fundamental task in the sciences, specifically in earth and environmental science where observational data are increasingly acquired, curated, and published systematically by environmental research infrastructures.

Typically subject to substantial processing, observational data are used by research communities, their research groups and individual scientists, who interpret such primary data for their meaning in the context of research investigations.

The result of interpretation is information—meaningful secondary or derived data—about the observed environment. Research infrastructures and research communities are thus essential to evolving uninterpreted observational data to information. In digital form, the classical bearer of information are the commonly known “(elaborated) data products,” for instance maps.

In such form, meaning is generally implicit e.g., in map colour coding, and thus largely inaccessible to machines. The systematic acquisition, curation, possible publishing and further processing of information gained in observational data interpretation—as machine readable data and their machine readable meaning—is not common practice among environmental research infrastructures.

For a use case in aerosol science, we elucidate these problems and present a Jupyter based prototype infrastructure that exploits a machine learning approach to interpretation and could support a research community in interpreting observational data and, more importantly, in curating and further using resulting information about a studied natural phenomenon.

URL : Curating Scientific Information in Knowledge Infrastructures

DOI : http://doi.org/10.5334/dsj-2018-021

Conceptualizing Data Curation Activities Within Two Academic Libraries

Authors : Sophia Lafferty-Hess, Julie Rudder, Moira Downey, Susan Ivey, Jennifer Darragh

A growing focus on sharing research data that meet certain standards, such as the FAIR guiding principles, has resulted in libraries increasingly developing and scaling up support for research data.

As libraries consider what new data curation services they would like to provide as part of their repository programs, there are various questions that arise surrounding scalability, resource allocation, requisite expertise, and how to communicate these services to the research community.

Data curation can involve a variety of tasks and activities. Some of these activities can be managed by systems, some require human intervention, and some require highly specialized domain or data type expertise.

At the 2017 Triangle Research Libraries Network Institute, staff from the University of North Carolina at Chapel Hill and Duke University used the 47 data curation activities identified by the Data Curation Network project to create conceptual groupings of data curation activities.

The results of this “thought-exercise” are discussed in this white paper. The purpose of this exercise was to provide more specificity around data curation within our individual contexts as a method to consistently discuss our current service models, identify gaps we would like to fill, and determine what is currently out of scope.

We hope to foster an open and productive discussion throughout the larger academic library community about how we prioritize data curation activities as we face growing demand and limited resources.

URL : Conceptualizing Data Curation Activities Within Two Academic Libraries

DOI : https://dx.doi.org/10.17605/OSF.IO/ZJ5PQ

How Important is Data Curation? Gaps and Opportunities for Academic Libraries

Authors: Lisa R Johnston, Jacob Carlson, Cynthia Hudson-Vitale, Heidi Imker, Wendy Kozlowski, Robert Olendorf, Claire Stewart

INTRODUCTION

Data curation may be an emerging service for academic libraries, but researchers actively “curate” their data in a number of ways—even if terminology may not always align. Building on past userneeds assessments performed via survey and focus groups, the authors sought direct input from researchers on the importance and utilization of specific data curation activities.

METHODS

Between October 21, 2016, and November 18, 2016, the study team held focus groups with 91 participants at six different academic institutions to determine which data curation activities were most important to researchers, which activities were currently underway for their data, and how satisfied they were with the results.

RESULTS

Researchers are actively engaged in a variety of data curation activities, and while they considered most data curation activities to be highly important, a majority of the sample reported dissatisfaction with the current state of data curation at their institution.

DISCUSSION

Our findings demonstrate specific gaps and opportunities for academic libraries to focus their data curation services to more effectively meet researcher needs.

CONCLUSION

Research libraries stand to benefit their users by emphasizing, investing in, and/or heavily promoting the highly valued services that may not currently be in use by many researchers.

URL : How Important is Data Curation? Gaps and Opportunities for Academic Libraries

DOI : http://doi.org/10.7710/2162-3309.2198

Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework

Author : Hagen Peukert

Handling heterogeneous data, subject to minimal costs, can be perceived as a classic management problem. The approach at hand applies established managerial theorizing to the field of data curation.

It is argued, however, that data curation cannot merely be treated as a standard case of applying management theory in a traditional sense. Rather, the practice of curating humanities research data, the specifications and adjustments of the model suggested here reveal an intertwined process, in which knowledge of both strategic management and solid information technology have to be considered.

Thus, suggestions on the strategic positioning of research data, which can be used as an analytical tool to understand the proposed workflow mechanisms, and the definition of workflow modules, which can be flexibly used in designing new standard workflows to configure research data repositories, are put forward.

URL : Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework

DOI : https://doi.org/10.2218/ijdc.v12i2.571

Practices of research data curation in institutional repositories: A qualitative view from repository staff

Authors : Dong Joon Lee, Besiki Stvilia

The importance of managing research data has been emphasized by the government, funding agencies, and scholarly communities. Increased access to research data increases the impact and efficiency of scientific activities and funding.

Thus, many research institutions have established or plan to establish research data curation services as part of their Institutional Repositories (IRs). However, in order to design effective research data curation services in IRs, and to build active research data providers and user communities around those IRs, it is essential to study current data curation practices and provide rich descriptions of the sociotechnical factors and relationships shaping those practices.

Based on 13 interviews with 15 IR staff members from 13 large research universities in the United States, this paper provides a rich, qualitative description of research data curation and use practices in IRs.

In particular, the paper identifies data curation and use activities in IRs, as well as their structures, roles played, skills needed, contradictions and problems present, solutions sought, and workarounds applied.

The paper can inform the development of best practice guides, infrastructure and service templates, as well as education in research data curation in Library and Information Science (LIS) schools.

URL : Practices of research data curation in institutional repositories: A qualitative view from repository staff

DOI : https://doi.org/10.1371/journal.pone.0173987

Setting up a National Research Data Curation Service for Qatar: Challenges and Opportunities

Authors : Arif Shaon, Armin Straube, Krishna Roy Chowdhury

Over the past decade, Qatar has been making considerable progress towards developing a sustainable research culture for the nation. The main driver behind Qatar’s progress in research and innovation is Qatar Foundation for Education, Science, and Community Development (QF), a private, non-profit organization that aims to utilise research as a catalyst for expanding, diversifying and improving the country’s economy, health and environment.

While this has resulted in a significant growth in the number of research publications produced by Qatari researchers in recent years, a nationally co-ordinated approach is needed to address some of the emerging but increasingly important aspects of research data curation, such as management and publication of research data as important outputs, and their long-term digital preservation.

Qatar National Library (QNL), launched in November 2012 under the umbrella of QF, aims to establish itself as a centre of excellence in Qatar for research data management, curation and publishing to address the research data-related needs of Qatari researchers and academics.

This paper describes QNL’s approach towards establishing a national research data curation service for Qatar, highlighting the associated opportunities and key challenges.

URL : Setting up a National Research Data Curation Service for Qatar: Challenges and Opportunities

Alternative location : http://www.ijdc.net/article/view/515

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

Authors : Mariam Alqasab, Suzanne M. Embury, Sandra de F. Mendes Sampaio

In the era of data science, datasets are shared widely and used for many purposes unforeseen by the original creators of the data. In this context, defects in datasets can have far reaching consequences, spreading from dataset to dataset, and affecting the consumers of data in ways that are hard to predict or quantify.

Some form of waste is often the result. For example, scientists using defective data to propose hypotheses for experimentation may waste their limited wet lab resources chasing the wrong experimental targets. Scarce drug trial resources may be used to test drugs that actually have little chance of giving a cure.

Because of the potential real world costs, database owners care about providing high quality data. Automated curation tools can be used to an extent to discover and correct some forms of defect.

However, in some areas human curation, performed by highly-trained domain experts, is needed to ensure that the data represents our current interpretation of reality accurately.

Human curators are expensive, and there is far more curation work to be done than there are curators available to perform it. Tools and techniques are needed to enable the full value to be obtained from the curation effort currently available.

In this paper,we explore one possible approach to maximising the value obtained from human curators, by automatically extracting information about data defects and corrections from the work that the curators do.

This information is packaged in a source independent form, to allow it to be used by the owners of other databases (for which human curation effort is not available or is insufficient).

This amplifies the efforts of the human curators, allowing their work to be applied to other sources, without requiring any additional effort or change in their processes or tool sets. We show that this approach can discover significant numbers of defects, which can also be found in other sources.

URL : Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

DOI : https://doi.org/10.2218/ijdc.v12i1.495