Revisiting the Data Lifecycle with Big Data Curation

Author : Line Pouchard

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions.

The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented.

In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research.

As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity.

We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking.

We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity.

We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science.

We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project

URL : Revisiting the Data Lifecycle with Big Data Curation

Alternative location : http://www.ijdc.net/index.php/ijdc/article/view/10.2.176

Research Data Sharing and Reuse Practices of Academic Faculty Researchers: A Study of the Virginia Tech Data Landscape

Author : Yi Shen

This paper presents the results of a research data assessment and landscape study in the institutional context of Virginia Tech to determine the data sharing and reuse practices of academic faculty researchers.

Through mapping the level of user engagement in “openness of data,” “openness of methodologies and workflows,” and “reuse of existing data,” this study contributes to the current knowledge in data sharing and open access, and supports the strategic development of institutional data stewardship.

Asking faculty researchers to self-reflect sharing and reuse from both data producers’ and data users’ perspectives, the study reveals a significant gap between the rather limited sharing activities and the highly perceived reuse or repurpose values regarding data, indicating that potential values of data for future research are lost right after the original work is done.

The localized and sporadic data management and documentation practices of researchers also contribute to the obstacles they themselves often encounter when reusing existing data.

URL : Research Data Sharing and Reuse Practices of Academic Faculty Researchers: A Study of the Virginia Tech Data Landscape

Alternative location : http://www.ijdc.net/index.php/ijdc/article/view/10.2.157

State of Data Guidance in Journal Policies: A Case Study in Oncology

Author: Deborah H. Charbonneau, Joan E. Beaudoin

This article reports the results of a study examining the state of data guidance provided to authors by 50 oncology journals. The purpose of the study was the identification of data practices addressed in the journals’ policies.

While a number of studies have examined data sharing practices among researchers, little is known about how journals address data sharing. Thus, what was discovered through this study has practical implications for journal publishers, editors, and researchers.

The findings indicate that journal publishers should provide more meaningful and comprehensive data guidance to prospective authors. More specifically, journal policies requiring data sharing, should direct researchers to relevant data repositories, and offer better metadata consultation to strengthen existing journal policies.

By providing adequate guidance for authors, and helping investigators to meet data sharing mandates, scholarly journal publishers can play a vital role in advancing access to research data.

URL : State of Data Guidance in Journal Policies: A Case Study in Oncology

Alternative location : http://www.ijdc.net/index.php/ijdc/article/view/10.2.136

Are Scientific Data Repositories Coping with Research Data Publishing?

Research data publishing is intended as the release of research data to make it possible for practitioners to (re)use them according to “open science” dynamics. There are three main actors called to deal with research data publishing practices: researchers, publishers, and data repositories.

This study analyses the solutions offered by generalist scientific data repositories, i.e., repositories supporting the deposition of any type of research data. These repositories cannot make any assumption on the application domain.

They are actually called to face with the almost open ended typologies of data used in science. The current practices promoted by such repositories are analysed with respect to eight key aspects of data publishing, i.e., dataset formatting, documentation, licensing, publication costs, validation, availability, discovery and access, and citation.

From this analysis it emerges that these repositories implement well consolidated practices and pragmatic solutions for literature repositories.

These practices and solutions can not totally meet the needs of management and use of datasets resources, especially in a context where rapid technological changes continuously open new exploitation prospects.

URL : Are Scientific Data Repositories Coping with Research Data Publishing?

DOI : http://doi.org/10.5334/dsj-2016-006

Making Student Research Data Discoverable: A Pilot Program Using Dataverse

Introduction

The support and curation of research data underlying theses and dissertations are an opportunity for institutions to enhance their ETD collections.

This article describes a pilot data archiving service that leverages Emory University’s existing Electronic Theses and Dissertations (ETDs) program.

Description of program

This pilot service tested the appropriateness of Dataverse, a data repository, as a data archiving and access solution for Emory University using research data identified in Emory University’s ETD repository, developed the legal documents necessary for a full implementation of Dataverse on campus, and expanded outreach efforts to meet the research data needs of graduate students.

This article also situates the pilot service within the context of Emory Libraries and explains how it relates to other library efforts currently underway.

Next steps

The pilot project team plans to seek permission from alumni whose data were included in the pilot to make them available publicly in Dataverse, and the team will revise the ETD license agreement to allow this type of use.

The team will also automate the ingest of supplemental ETD research data into the data repository where possible and create a workshop series for students who are creating research data as part of their theses or dissertations.

URL : Making Student Research Data Discoverable: A Pilot Program Using Dataverse

URL : https://pid.emory.edu/ark:/25593/q4f1g

Disciplinary differences in opening research data

The management and widespread sharing of publicly funded research data has gained significant momentum among governments, funders, institutions, journals and data service providers around the world.

However, there is no ‘one-size-fits-all’ approach to open research data across academic disciplines. Different disciplines produce different types of data and have various procedures for analysing, archiving and publishing it.

This briefing paper presents the current state of open research data across academic disciplines. It describes disciplinary characteristics inhibiting a larger take-up of open research data mandates.

Additionally it presents the current strategies and policies established by funders, institutions, journals and data service providers alongside general data policies.

URL : Disciplinary differences in opening research data

Alternative location : http://www.pasteur4oa.eu/resources/209

Assessment of and Response to Data Needs of Clinical and Translational Science Researchers and Beyond

Objective and Setting

As universities and libraries grapple with data management and “big data,” the need for data management solutions across disciplines is particularly relevant in clinical and translational science (CTS) research, which is designed to traverse disciplinary and institutional boundaries.

At the University of Florida Health Science Center Library, a team of librarians undertook an assessment of the research data management needs of CTS researchers, including an online assessment and follow-up one-on-one interviews.

Design and Methods

The 20-question online assessment was distributed to all investigators affiliated with UF’s Clinical and Translational Science Institute (CTSI) and 59 investigators responded. Follow-up in-depth interviews were conducted with nine faculty and staff members.

Results

Results indicate that UF’s CTS researchers have diverse data management needs that are often specific to their discipline or current research project and span the data lifecycle. A common theme in responses was the need for consistent data management training, particularly for graduate students; this led to localized training within the Health Science Center and CTSI, as well as campus-wide training.

Another campus-wide outcome was the creation of an action-oriented Data Management/Curation Task Force, led by the libraries and with participation from Research Computing and the Office of Research.

Conclusions

Initiating conversations with affected stakeholders and campus leadership about best practices in data management and implications for institutional policy shows the library’s proactive leadership and furthers our goal to provide concrete guidance to our users in this area.

URL : Assessment of and Response to Data Needs of Clinical and Translational Science Researchers and Beyond

Alternative location : http://escholarship.umassmed.edu/jeslib/vol5/iss1/2/