Repository Approaches to Improving the Quality of Shared Data and Code

Authors : Ana Trisovic, Katherine Mika, Ceilyn Boyd, Sebastian Feger, Mercè Crosas

Sharing data and code for reuse has become increasingly important in scientific work over the past decade. However, in practice, shared data and code may be unusable, or published results obtained from them may be irreproducible.

Data repository features and services contribute significantly to the quality, longevity, and reusability of datasets.

This paper presents a combination of original and secondary data analysis studies focusing on computational reproducibility, data curation, and gamified design elements that can be employed to indicate and improve the quality of shared data and code.

The findings of these studies are sorted into three approaches that can be valuable to data repositories, archives, and other research dissemination platforms.

URL : Repository Approaches to Improving the Quality of Shared Data and Code

DOI : https://doi.org/10.3390/data6020015

From Conceptualization to Implementation: FAIR Assessment of Research Data Objects

Authors: Anusuriya Devaraju, Mustapha Mokrane, Linas Cepinskas, Robert Huber, Patricia Herterich, Jerry de Vries, Vesa Akerman, Hervé L’Hours, Joy Davidson, Michael Diepenbroek

Funders and policy makers have strongly recommended the uptake of the FAIR principles in scientific data management. Several initiatives are working on the implementation of the principles and standardized applications to systematically evaluate data FAIRness.

This paper presents practical solutions, namely metrics and tools, developed by the FAIRsFAIR project to pilot the FAIR assessment of research data objects in trustworthy data repositories. The metrics are mainly built on the indicators developed by the RDA FAIR Data Maturity Model Working Group.

The tools’ design and evaluation followed an iterative process. We present two applications of the metrics: an awareness-raising self-assessment tool and an automated FAIR data assessment tool.

Initial results of testing the tools with researchers and data repositories are discussed, and future improvements suggested including the next steps to enable FAIR data assessment in the broader research data ecosystem.

URL : From Conceptualization to Implementation: FAIR Assessment of Research Data Objects

DOI : http://doi.org/10.5334/dsj-2021-004

An overview of biomedical platforms for managing research data

Authors : Vivek Navale, Denis von Kaeppler, Matthew McAuliffe

Biomedical platforms provide the hardware and software to securely ingest, process, validate, curate, store, and share data. Many large-scale biomedical platforms use secure cloud computing technology for analyzing, integrating, and storing phenotypic, clinical, and genomic data. Several web-based platforms are available for researchers to access services and tools for biomedical research.

The use of bio-containers can facilitate the integration of bioinformatics software with various data analysis pipelines. Adoption of Common Data Models, Common Data Elements, and Ontologies can increase the likelihood of data reuse. Managing biomedical Big Data will require the development of strategies that can efficiently leverage public cloud computing resources.

The use of the research community developed standards for data collection can foster the development of machine learning methods for data processing and analysis. Increasingly platforms will need to support the integration of data from multiple disease area research.

URL : An overview of biomedical platforms for managing research data

DOI : https://doi.org/10.1007/s42488-020-00040-0

Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

Authors : Valentin Danchev, Yan Min, John Borghi, Mike Baiocchi, John P. A. Ioann

Importance

The benefits of responsible sharing of individual-participant data (IPD) from clinical studies are well recognized, but stakeholders often disagree on how to align those benefits with privacy risks, costs, and incentives for clinical trialists and sponsors.

The International Committee of Medical Journal Editors (ICMJE) required a data sharing statement (DSS) from submissions reporting clinical trials effective July 1, 2018. The required DSSs provide a window into current data sharing rates, practices, and norms among trialists and sponsors.

Objective

To evaluate the implementation of the ICMJE DSS requirement in 3 leading medical journals: JAMA, Lancet, and New England Journal of Medicine (NEJM).

Design, Setting, and Participants

This is a cross-sectional study of clinical trial reports published as articles in JAMA, Lancet, and NEJM between July 1, 2018, and April 4, 2020. Articles not eligible for DSS, including observational studies and letters or correspondence, were excluded.

A MEDLINE/PubMed search identified 487 eligible clinical trials in JAMA (112 trials), Lancet (147 trials), and NEJM (228 trials). Two reviewers evaluated each of the 487 articles independently.

Exposure

Publication of clinical trial reports in an ICMJE medical journal requiring a DSS.

Main Outcomes and Measures

The primary outcomes of the study were declared data availability and actual data availability in repositories. Other captured outcomes were data type, access, and conditions and reasons for data availability or unavailability. Associations with funding sources were examined.

Results

A total of 334 of 487 articles (68.6%; 95% CI, 64%-73%) declared data sharing, with nonindustry NIH-funded trials exhibiting the highest rates of declared data sharing (89%; 95% CI, 80%-98%) and industry-funded trials the lowest (61%; 95% CI, 54%-68%).

However, only 2 IPD sets (0.6%; 95% CI, 0.0%-1.5%) were actually deidentified and publicly available as of April 10, 2020. The remaining were supposedly accessible via request to authors (143 of 334 articles [42.8%]), repository (89 of 334 articles [26.6%]), and company (78 of 334 articles [23.4%]).

Among the 89 articles declaring that IPD would be stored in repositories, only 17 (19.1%) deposited data, mostly because of embargo and regulatory approval. Embargo was set in 47.3% of data-sharing articles (158 of 334), and in half of them the period exceeded 1 year or was unspecified.

Conclusions and Relevance

Most trials published in JAMA, Lancet, and NEJM after the implementation of the ICMJE policy declared their intent to make clinical data available. However, a wide gap between declared and actual data sharing exists.

To improve transparency and data reuse, journals should promote the use of unique pointers to data set location and standardized choices for embargo periods and access requirements.

URL : Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

DOI :10.1001/jamanetworkopen.2020.33972

COVID‐19 and the generation of novel scientific knowledge: Evidence‐based decisions and data sharing

Authors : Lucie Perillat, Brian S. Baigrie

Rationale, aims and objectives

The COVID‐19 pandemic has impacted every facet of society, including medical research. This paper is the second part of a series of articles that explore the intricate relationship between the different challenges that have hindered biomedical research and the generation of novel scientific knowledge during the COVID‐19 pandemic.

In the first part of this series, we demonstrated that, in the context of COVID‐19, the scientific community has been faced with numerous challenges with respect to (1) finding and prioritizing relevant research questions and (2) choosing study designs that are appropriate for a time of emergency.

Methods

During the early stages of the pandemic, research conducted on hydroxychloroquine (HCQ) sparked several heated debates with respect to the scientific methods used and the quality of knowledge generated.

Research on HCQ is used as a case study in both papers. The authors explored biomedical databases, peer‐reviewed journals, pre‐print servers and media articles to identify relevant literature on HCQ and COVID‐19, and examined philosophical perspectives on medical research in the context of this pandemic and previous global health challenges.

Results

This second paper demonstrates that a lack of research prioritization and methodological rigour resulted in the generation of fleeting and inconsistent evidence that complicated the development of public health guidelines.

The reporting of scientific findings to the scientific community and general public highlighted the difficulty of finding a balance between accuracy and speed.

Conclusions

The COVID‐19 pandemic presented challenges in terms of (3) evaluating evidence for the purpose of making evidence‐based decisions and (4) sharing scientific findings with the rest of the scientific community.

This second paper demonstrates that the four challenges outlined in the first and second papers have often compounded each other and have contributed to slowing down the creation of novel scientific knowledge during the COVID‐19 pandemic.

DOI : https://doi.org/10.1111/jep.13548

Research data management and data sharing behaviour of university researchers

Authors : Yurdagül Ünal, Gobinda Chowdhury, Serap Kurbanoğlu, Joumana Boustany, Geoff Walton

Introduction

The aim of this study is to understand how university researchers behave in the context of using and sharing research data in OA mode.

Method

An online questionnaire survey was conducted amongst academics and researchers in three countries – UK, France and Turkey. There were 26 questions to collect data on: researcher information, e.g. discipline, gender and experience; data sharing practices, concerns; familiarity with data management practices; and policies/challenges including knowledge of metadata and training.

Analysis

SPSS was used to analyse the dataset, and Chi-Square tests, at 0.05 significance level, were conducted to find out association between researchers’ behaviour in data sharing and different areas of research data management (RDM).

Findings

Findings show that OA is still not common amongst researchers. Data ethics and legal issues appear to be the most significant concerns for researchers. Most researchers have not received any training in RDM such as data management planning metadata, or file naming. However, most researchers would welcome formal training in different aspects of RDM.

Conclusion

This study indicates directions for further research to understand the disciplinary differences in researchers’ data access and management behaviour so that appropriate training and advocacy programmes can be developed to promote OA to research data.

URL : http://www.informationr.net/ir/24-1/isic2018/isic1818.html

A Review of Open Research Data Policies and Practices in China

Authors: Lili Zhang, Robert R. Downs, Jianhui Li, Liangming Wen, Chengzan Li

This paper initially conducts a literature review and content analysis of the open research data policies in China. Next, a series of exemplars describe data practices to promote and enable the use of open research data, including open data practices in research programs, data repositories, data journals, and citizen science.

Moreover, the top four driving forces are identified and analyzed along with their responsible guiding work. In addition, the “landscape of open research data ecology in China” is derived from the literature review and from observations of actual cases, where the interaction and mutual development of data policies, data programs, and data practices are recognized.

Finally, future trends of research data practices within China and internationally are discussed. We hope the analysis provides perspective on current open data practices in China along with insight into the need for additional research on scientific data sharing and management.

URL : A Review of Open Research Data Policies and Practices in China

DOI : http://doi.org/10.5334/dsj-2021-003