The Landscape of Research Data Repositories in 2015: A re3data Analysis

Authors : Maxi Kindling, Heinz Pampel, Stephanie van de Sandt, Jessika Rücknagel, Paul Vierkant, Gabriele Kloska, Michael Witt, Peter Schirmbacher, Roland Bertelmann, Frank Scholze

This article provides a comprehensive descriptive and statistical analysis of metadata information on 1,381 research data repositories worldwide and across all research disciplines.

The analyzed metadata is derived from the re3data database, enabling search and browse functionalities for the global registry of research data repositories. The analysis focuses mainly on institutions that operate research data repositories, types and subjects of research data repositories (RDR), access conditions as well as services provided by the research data repositories.

RDR differ in terms of the service levels they offer, languages they support or standards they comply with. These statements are commonly acknowledged by saying the RDR landscape is heterogeneous.

As expected, we found a heterogeneous RDR landscape that is mostly influenced by the repositories’ disciplinary background for which they offer services.

URL : http://www.dlib.org/dlib/march17/kindling/03kindling.html

Discovery and Reuse of Open Datasets: An Exploratory Study

Authors : Sara Mannheimer, Leila Belle Sterman, Susan Borda

Objective

This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories.

Methods

Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric.

The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description.

Results

Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories.

Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers.

The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates.

Conclusions

The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets.

Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

URL : Discovery and Reuse of Open Datasets: An Exploratory Study

DOI : http://dx.doi.org/10.7191/jeslib.2016.1091

Revisiting the Data Lifecycle with Big Data Curation

Author : Line Pouchard

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions.

The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented.

In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research.

As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity.

We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking.

We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity.

We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science.

We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project

URL : Revisiting the Data Lifecycle with Big Data Curation

Alternative location : http://www.ijdc.net/index.php/ijdc/article/view/10.2.176

Are Scientific Data Repositories Coping with Research Data Publishing?

Research data publishing is intended as the release of research data to make it possible for practitioners to (re)use them according to “open science” dynamics. There are three main actors called to deal with research data publishing practices: researchers, publishers, and data repositories.

This study analyses the solutions offered by generalist scientific data repositories, i.e., repositories supporting the deposition of any type of research data. These repositories cannot make any assumption on the application domain.

They are actually called to face with the almost open ended typologies of data used in science. The current practices promoted by such repositories are analysed with respect to eight key aspects of data publishing, i.e., dataset formatting, documentation, licensing, publication costs, validation, availability, discovery and access, and citation.

From this analysis it emerges that these repositories implement well consolidated practices and pragmatic solutions for literature repositories.

These practices and solutions can not totally meet the needs of management and use of datasets resources, especially in a context where rapid technological changes continuously open new exploitation prospects.

URL : Are Scientific Data Repositories Coping with Research Data Publishing?

DOI : http://doi.org/10.5334/dsj-2016-006

Making Student Research Data Discoverable: A Pilot Program Using Dataverse

Introduction

The support and curation of research data underlying theses and dissertations are an opportunity for institutions to enhance their ETD collections.

This article describes a pilot data archiving service that leverages Emory University’s existing Electronic Theses and Dissertations (ETDs) program.

Description of program

This pilot service tested the appropriateness of Dataverse, a data repository, as a data archiving and access solution for Emory University using research data identified in Emory University’s ETD repository, developed the legal documents necessary for a full implementation of Dataverse on campus, and expanded outreach efforts to meet the research data needs of graduate students.

This article also situates the pilot service within the context of Emory Libraries and explains how it relates to other library efforts currently underway.

Next steps

The pilot project team plans to seek permission from alumni whose data were included in the pilot to make them available publicly in Dataverse, and the team will revise the ETD license agreement to allow this type of use.

The team will also automate the ingest of supplemental ETD research data into the data repository where possible and create a workshop series for students who are creating research data as part of their theses or dissertations.

URL : Making Student Research Data Discoverable: A Pilot Program Using Dataverse

URL : https://pid.emory.edu/ark:/25593/q4f1g

Funding models for Open Access digital data repositories

Purpose

The purpose of this paper is to examine funding models for Open Access (OA) digital data repositories whose costs are not wholly core funded. Whilst such repositories are free to access, they are not without significant cost to build and maintain and the lack of both full core costs and a direct funding stream through payment-for-use poses a considerable financial challenge, placing their future and the digital collections they hold at risk.

Design/methodology/approach

The authors document 14 different potential funding streams for OA digital data repositories, grouped into six classes (institutional, philanthropy, research, audience, service, volunteer), drawing on the ongoing experiences of seeking a sustainable funding for the Digital Repository of Ireland (DRI).

Findings

There is no straight forward solution to funding OA digital data repositories that are not wholly core funded, with a number of general and specific challenges facing each repository, and each funding model having strengths and weaknesses. The proposed DRI solution is the adoption of a blended approach that seeks to ameliorate cyclical effects across funding streams by generating income from a number of sources rather than overly relying on a single one, though it is still reliant on significant state core funding to be viable.

Practical implications

The detailing of potential funding streams offers practical financial solutions to other OA digital data repositories which are seeking a means to become financially sustainable in the absence of full core funding.

Originality/value

The review assesses and provides concrete advice with respect to potential funding streams in order to help repository owners address the financing conundrum they face.

URL : http://dx.doi.org/10.1108/OIR-01-2015-0031