Strengthening institutional data management and promoting data sharing in the social and economic sciences

Authors : Monika Linne, Wolfgang Zenk-Möltgen

In the German social and economic sciences there is a growing awareness of flexible data distribution and research data reuse, especially as increasing numbers of research funders recommend publishing research data as the basis for scientific insight.

However, a data-sharing mentality has not yet been established in Germany attributable to researchers’ strong reservations about publishing their data.

This attitude is exacerbated by the fact that, at present, there is no trusted national data sharing repository that covers the particular requirements of institutions regarding research data.

This article discusses how this objective can be achieved with the project initiative SowiDataNet.

The development of a community-driven data repository is a logically consistent and important step towards an attitude shift concerning data sharing in the social and economic sciences.

DOI : http://doi.org/10.18352/lq.10195

The Landscape of Research Data Repositories in 2015: A re3data Analysis

Authors : Maxi Kindling, Heinz Pampel, Stephanie van de Sandt, Jessika Rücknagel, Paul Vierkant, Gabriele Kloska, Michael Witt, Peter Schirmbacher, Roland Bertelmann, Frank Scholze

This article provides a comprehensive descriptive and statistical analysis of metadata information on 1,381 research data repositories worldwide and across all research disciplines.

The analyzed metadata is derived from the re3data database, enabling search and browse functionalities for the global registry of research data repositories. The analysis focuses mainly on institutions that operate research data repositories, types and subjects of research data repositories (RDR), access conditions as well as services provided by the research data repositories.

RDR differ in terms of the service levels they offer, languages they support or standards they comply with. These statements are commonly acknowledged by saying the RDR landscape is heterogeneous.

As expected, we found a heterogeneous RDR landscape that is mostly influenced by the repositories’ disciplinary background for which they offer services.

URL : http://www.dlib.org/dlib/march17/kindling/03kindling.html

Discovery and Reuse of Open Datasets: An Exploratory Study

Authors : Sara Mannheimer, Leila Belle Sterman, Susan Borda

Objective

This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories.

Methods

Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric.

The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description.

Results

Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories.

Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers.

The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates.

Conclusions

The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets.

Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

URL : Discovery and Reuse of Open Datasets: An Exploratory Study

DOI : http://dx.doi.org/10.7191/jeslib.2016.1091

Revisiting the Data Lifecycle with Big Data Curation

Author : Line Pouchard

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions.

The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented.

In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research.

As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity.

We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking.

We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity.

We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science.

We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project

URL : Revisiting the Data Lifecycle with Big Data Curation

Alternative location : http://www.ijdc.net/index.php/ijdc/article/view/10.2.176

Are Scientific Data Repositories Coping with Research Data Publishing?

Research data publishing is intended as the release of research data to make it possible for practitioners to (re)use them according to “open science” dynamics. There are three main actors called to deal with research data publishing practices: researchers, publishers, and data repositories.

This study analyses the solutions offered by generalist scientific data repositories, i.e., repositories supporting the deposition of any type of research data. These repositories cannot make any assumption on the application domain.

They are actually called to face with the almost open ended typologies of data used in science. The current practices promoted by such repositories are analysed with respect to eight key aspects of data publishing, i.e., dataset formatting, documentation, licensing, publication costs, validation, availability, discovery and access, and citation.

From this analysis it emerges that these repositories implement well consolidated practices and pragmatic solutions for literature repositories.

These practices and solutions can not totally meet the needs of management and use of datasets resources, especially in a context where rapid technological changes continuously open new exploitation prospects.

URL : Are Scientific Data Repositories Coping with Research Data Publishing?

DOI : http://doi.org/10.5334/dsj-2016-006

Making Student Research Data Discoverable: A Pilot Program Using Dataverse

Introduction

The support and curation of research data underlying theses and dissertations are an opportunity for institutions to enhance their ETD collections.

This article describes a pilot data archiving service that leverages Emory University’s existing Electronic Theses and Dissertations (ETDs) program.

Description of program

This pilot service tested the appropriateness of Dataverse, a data repository, as a data archiving and access solution for Emory University using research data identified in Emory University’s ETD repository, developed the legal documents necessary for a full implementation of Dataverse on campus, and expanded outreach efforts to meet the research data needs of graduate students.

This article also situates the pilot service within the context of Emory Libraries and explains how it relates to other library efforts currently underway.

Next steps

The pilot project team plans to seek permission from alumni whose data were included in the pilot to make them available publicly in Dataverse, and the team will revise the ETD license agreement to allow this type of use.

The team will also automate the ingest of supplemental ETD research data into the data repository where possible and create a workshop series for students who are creating research data as part of their theses or dissertations.

URL : Making Student Research Data Discoverable: A Pilot Program Using Dataverse

URL : https://pid.emory.edu/ark:/25593/q4f1g