Discovery and Reuse of Open Datasets: An Exploratory Study

Authors : Sara Mannheimer, Leila Belle Sterman, Susan Borda

Objective

This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories.

Methods

Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric.

The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description.

Results

Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories.

Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers.

The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates.

Conclusions

The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets.

Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

URL : Discovery and Reuse of Open Datasets: An Exploratory Study

DOI : http://dx.doi.org/10.7191/jeslib.2016.1091

Revisiting the Data Lifecycle with Big Data Curation

Author : Line Pouchard

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions.

The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented.

In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research.

As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity.

We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking.

We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity.

We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science.

We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project

URL : Revisiting the Data Lifecycle with Big Data Curation

Alternative location : http://www.ijdc.net/index.php/ijdc/article/view/10.2.176

Are Scientific Data Repositories Coping with Research Data Publishing?

Research data publishing is intended as the release of research data to make it possible for practitioners to (re)use them according to “open science” dynamics. There are three main actors called to deal with research data publishing practices: researchers, publishers, and data repositories.

This study analyses the solutions offered by generalist scientific data repositories, i.e., repositories supporting the deposition of any type of research data. These repositories cannot make any assumption on the application domain.

They are actually called to face with the almost open ended typologies of data used in science. The current practices promoted by such repositories are analysed with respect to eight key aspects of data publishing, i.e., dataset formatting, documentation, licensing, publication costs, validation, availability, discovery and access, and citation.

From this analysis it emerges that these repositories implement well consolidated practices and pragmatic solutions for literature repositories.

These practices and solutions can not totally meet the needs of management and use of datasets resources, especially in a context where rapid technological changes continuously open new exploitation prospects.

URL : Are Scientific Data Repositories Coping with Research Data Publishing?

DOI : http://doi.org/10.5334/dsj-2016-006

Making Student Research Data Discoverable: A Pilot Program Using Dataverse

Introduction

The support and curation of research data underlying theses and dissertations are an opportunity for institutions to enhance their ETD collections.

This article describes a pilot data archiving service that leverages Emory University’s existing Electronic Theses and Dissertations (ETDs) program.

Description of program

This pilot service tested the appropriateness of Dataverse, a data repository, as a data archiving and access solution for Emory University using research data identified in Emory University’s ETD repository, developed the legal documents necessary for a full implementation of Dataverse on campus, and expanded outreach efforts to meet the research data needs of graduate students.

This article also situates the pilot service within the context of Emory Libraries and explains how it relates to other library efforts currently underway.

Next steps

The pilot project team plans to seek permission from alumni whose data were included in the pilot to make them available publicly in Dataverse, and the team will revise the ETD license agreement to allow this type of use.

The team will also automate the ingest of supplemental ETD research data into the data repository where possible and create a workshop series for students who are creating research data as part of their theses or dissertations.

URL : Making Student Research Data Discoverable: A Pilot Program Using Dataverse

URL : https://pid.emory.edu/ark:/25593/q4f1g

Funding models for Open Access digital data repositories

Purpose

The purpose of this paper is to examine funding models for Open Access (OA) digital data repositories whose costs are not wholly core funded. Whilst such repositories are free to access, they are not without significant cost to build and maintain and the lack of both full core costs and a direct funding stream through payment-for-use poses a considerable financial challenge, placing their future and the digital collections they hold at risk.

Design/methodology/approach

The authors document 14 different potential funding streams for OA digital data repositories, grouped into six classes (institutional, philanthropy, research, audience, service, volunteer), drawing on the ongoing experiences of seeking a sustainable funding for the Digital Repository of Ireland (DRI).

Findings

There is no straight forward solution to funding OA digital data repositories that are not wholly core funded, with a number of general and specific challenges facing each repository, and each funding model having strengths and weaknesses. The proposed DRI solution is the adoption of a blended approach that seeks to ameliorate cyclical effects across funding streams by generating income from a number of sources rather than overly relying on a single one, though it is still reliant on significant state core funding to be viable.

Practical implications

The detailing of potential funding streams offers practical financial solutions to other OA digital data repositories which are seeking a means to become financially sustainable in the absence of full core funding.

Originality/value

The review assesses and provides concrete advice with respect to potential funding streams in order to help repository owners address the financing conundrum they face.

URL : http://dx.doi.org/10.1108/OIR-01-2015-0031

Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies

Statut

INTRODUCTION

Many research institutions have developed research data services in their libraries, often in anticipation of or in response to funder policy. However, policies at the institution level are either not well known or nonexistent.

METHODS

This study reviewed library data services efforts and institutional data policies of 206 American universities, drawn from the July 2014 Carnegie list of universities with “Very High” or “High” research activity designation. Twenty-four different characteristics relating to university type, library data services, policy type, and policy contents were examined.

RESULTS

The study has uncovered findings surrounding library data services, institutional data policies, and content within the policies.

DISCUSSION

Overall, there is a general trend toward the development and implementation of data services within the university libraries. Interestingly, just under half of the universities examined had a policy of some sort that either specified or mentioned research data.

Many of these were standalone data policies, while others were intellectual property policies that included research data. When data policies were discoverable, not behind a log in, they focused on the definition of research data, data ownership, data retention, and terms surrounding the separation of a researcher from the institution.

CONCLUSION

By becoming well versed on research data policies, librarians can provide support for researchers by navigating the policies at their institutions, facilitating the activities needed to comply with the requirements of research funders and publishers. This puts academic libraries in a unique position to provide insight and guidance in the development and revisions of institutional data policies.

URL : Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies

DOI : http://doi.org/10.7710/2162-3309.1232

Cross-Linking Between Journal Publications and Data Repositories: A Selection of Examples

Statut

“This article provides a selection of examples of the many ways that a link can be made between a journal article (whether in a data journal or otherwise) and a dataset held in a data repository. In some cases the method of linking is well established, while in others, they have yet to be rolled out uniformly across the journal landscape. We explore ways in which these examples might be implemented in a data journal, such as Geoscience Data Journal, as explored by the PREPARDE project.”

URL :  Cross-Linking Between Journal Publications and Data Repositories

Alternative URL : http://www.ijdc.net/index.php/ijdc/article/view/9.1.164