A Conceptual Enterprise Framework for Managing Scientific Data Stewardship

Authors : Ge Peng, Jeffrey L. Privette, Curt Tilmes, Sky Bristol, Tom Maycock, John J. Bates, Scott Hausman, Otis Brown, Edward J. Kearns

Scientific data stewardship is an important part of long-term preservation and the use/reuse of digital research data. It is critical for ensuring trustworthiness of data, products, and services, which is important for decision-making.

Recent U.S. federal government directives and scientific organization guidelines have levied specific requirements, increasing the need for a more formal approach to ensuring that stewardship activities support compliance verification and reporting.

However, many science data centers lack an integrated, systematic, and holistic framework to support such efforts. The current business- and process-oriented stewardship frameworks are too costly and lengthy for most data centers to implement.

They often do not explicitly address the federal stewardship requirements and/or the uniqueness of geospatial data. This work proposes a data-centric conceptual enterprise framework for managing stewardship activities, based on the philosophy behind the Plan-Do-Check-Act (PDCA) cycle, a proven industrial concept.

This framework, which includes the application of maturity assessment models, allows for quantitative evaluation of how organizations manage their stewardship activities and supports informed decision-making for continual improvement towards full compliance with federal, agency, and user requirements.

URL : A Conceptual Enterprise Framework for Managing Scientific Data Stewardship

DOI : http://doi.org/10.5334/dsj-2018-015

A Research Graph dataset for connecting research data repositories using RD-Switchboard

Authors : Amir Aryani, Marta Poblet, Kathryn Unsworth, Jingbo Wang, Ben Evans, Anusuriya Devaraju, Brigitte Hausstein, Claus-Peter Klas, Benjamin Zapilko, Samuele Kaplun

This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures.

The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants.

The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation.

Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.

URL : A Research Graph dataset for connecting research data repositories using RD-Switchboard

Alternative location : https://www.nature.com/articles/sdata201899

Data sharing in PLOS ONE: An analysis of Data Availability Statements

Authors : Lisa M. Federer, Christopher W. Belter, Douglas J. Joubert, Alicia Livinski, Ya-Ling Lu, Lissa N. Snyders, Holly Thompson

A number of publishers and funders, including PLOS, have recently adopted policies requiring researchers to share the data underlying their results and publications. Such policies help increase the reproducibility of the published literature, as well as make a larger body of data available for reuse and re-analysis.

In this study, we evaluate the extent to which authors have complied with this policy by analyzing Data Availability Statements from 47,593 papers published in PLOS ONE between March 2014 (when the policy went into effect) and May 2016.

Our analysis shows that compliance with the policy has increased, with a significant decline over time in papers that did not include a Data Availability Statement. However, only about 20% of statements indicate that data are deposited in a repository, which the PLOS policy states is the preferred method.

More commonly, authors state that their data are in the paper itself or in the supplemental information, though it is unclear whether these data meet the level of sharing required in the PLOS policy.

These findings suggest that additional review of Data Availability Statements or more stringent policies may be needed to increase data sharing.

URL : Data sharing in PLOS ONE: An analysis of Data Availability Statements

DOI : https://doi.org/10.1371/journal.pone.0194768

The State of Assessing Data Stewardship Maturity – An Overview

Author : Ge Peng

Data stewardship encompasses all activities that preserve and improve the information content, accessibility, and usability of data and metadata. Recent regulations, mandates, policies, and guidelines set forth by the U.S. government, federal other, and funding agencies, scientific societies and scholarly publishers, have levied stewardship requirements on digital scientific data.

This elevated level of requirements has increased the need for a formal approach to stewardship activities that supports compliance verification and reporting. Meeting or verifying compliance with stewardship requirements requires assessing the current state, identifying gaps, and, if necessary, defining a roadmap for improvement.

This, however, touches on standards and best practices in multiple knowledge domains. Therefore, data stewardship practitioners, especially these at data repositories or data service centers or associated with data stewardship programs, can benefit from knowledge of existing maturity assessment models.

This article provides an overview of the current state of assessing stewardship maturity for federally funded digital scientific data. A brief description of existing maturity assessment models and related application(s) is provided.

This helps stewardship practitioners to readily obtain basic information about these models. It allows them to evaluate each model’s suitability for their unique verification and improvement needs.

URL : The State of Assessing Data Stewardship Maturity – An Overview

DOI : http://doi.org/10.5334/dsj-2018-007

Data Sustainability and Reuse Pathways of Natural Resources and Environmental Scientists

Author : Yi Shen

This paper presents a multifarious examination of natural resources and environmental scientists’ adventures navigating the policy change towards open access and cultural shift in data management, sharing, and reuse.

Situated in the institutional context of Virginia Tech, a focus group and multiple individual interviews were conducted exploring the domain scientists’ all-around experiences, performances, and perspectives on their collection, adoption, integration, preservation, and management of data.

The results reveal the scientists’ struggles, concerns, and barriers encountered, as well as their shared values, beliefs, passions, and aspirations when working with data. Based on these findings, this study provides suggestions on data modeling and knowledge representation strategies to support the long-term viability, stewardship, accessibility, and sustainability of scientific data.

It also discusses the art of curation as creative scholarship and new opportunities for data librarians and information professionals to mobilize the data revolution.

URL : https://arxiv.org/abs/1803.01788

Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in The BMJ and PLOS Medicine

Authors : Florian Naudet, Charlotte Sakarovitch, Perrine Janiaud, Ioana Cristea, Daniele Fanelli, David Moher, John P A Ioannidis


To explore the effectiveness of data sharing by randomized controlled trials (RCTs) in journals with a full data sharing policy and to describe potential difficulties encountered in the process of performing reanalyses of the primary outcomes.


Survey of published RCTs.



Eligibility criteria

RCTs that had been submitted and published by The BMJ and PLOS Medicine subsequent to the adoption of data sharing policies by these journals.

Main outcome measure

The primary outcome was data availability, defined as the eventual receipt of complete data with clear labelling. Primary outcomes were reanalyzed to assess to what extent studies were reproduced. Difficulties encountered were described.


37 RCTs (21 from The BMJ and 16 from PLOS Medicine) published between 2013 and 2016 met the eligibility criteria. 17/37 (46%, 95% confidence interval 30% to 62%) satisfied the definition of data availability and 14 of the 17 (82%, 59% to 94%) were fully reproduced on all their primary outcomes. Of the remaining RCTs, errors were identified in two but reached similar conclusions and one paper did not provide enough information in the Methods section to reproduce the analyses. Difficulties identified included problems in contacting corresponding authors and lack of resources on their behalf in preparing the datasets. In addition, there was a range of different data sharing practices across study groups.


Data availability was not optimal in two journals with a strong policy for data sharing. When investigators shared data, most reanalyses largely reproduced the original results. Data sharing practices need to become more widespread and streamlined to allow meaningful reanalyses and reuse of data.