Le partage des données vu par les chercheurs : une approche par la valeur

Auteur/Author : Violaine Rebouillat

Le propos de cet article porte sur la compréhension des logiques qui interviennent dans la définition de la valeur des données de la recherche, celles-ci pouvant avoir une influence sur les critères déterminant leur motivation au partage.

L’approche méthodologique repose sur une enquête qualitative, menée dans le cadre d’une recherche doctorale, qui a déployé 57 entretiens semi-directifs. Alors que les travaux menés autour des données sont focalisés sur les freins et motivations du partage, l’originalité de cette recherche consiste à identifier les différents prismes par lesquels la question de la valeur des données impacte la motivation et la décision de leur partage.

L’analyse des résultats montre que, tous domaines confondus, la valeur des données reste encore cristallisée autour de la publication et de la reconnaissance symbolique du travail du chercheur.

Les résultats permettent de comprendre que la question du partage est confrontée à un impensé : celui du cadre actuel de l’évaluation de la recherche, qui met l’article scientifique au cœur de son dispositif.

Ce travail contribue donc à montrer que l’avenir du partage des données dépend des systèmes alternatifs futurs d’évaluation de la recherche, associés à la science ouverte.

URL : https://lesenjeux.univ-grenoble-alpes.fr/2021/varia/03-le-partage-des-donnees-vu-par-les-chercheurs-une-approche-par-la-valeur/

Openness in Big Data and Data Repositories. The Application of an Ethics Framework for Big Data in Healthand Research

Authors : Vicki Xafis, Markus K. Labude

There is a growing expectation, or even requirement, for researchers to deposit a variety of research data in data repositories as a condition of funding or publication. This expectation recognizes the enormous benefits of data collected and created for research purposes being made available for secondary uses, as open science gains increasing support.

This is particularly so in the context of big data, especially where health data is involved. There are, however, also challenges relating to the collection, storage, and re-use of research data.

This paper gives a brief overview of the landscape of data sharing via data repositories and discusses some of the key ethical issues raised by the sharing of health-related research data, including expectations of privacy and confidentiality, the transparency of repository governance structures, access restrictions, as well as data ownership and the fair attribution of credit.

To consider these issues and the values that are pertinent, the paper applies the deliberative balancing approach articulated in the Ethics Framework for Big Data in Health and Research (Xafis et al. 2019) to the domain of Openness in Big Data and Data Repositories.

Please refer to that article for more information on how this framework is to be used, including a full explanation of the key values involved and the balancing approach used in the case study at the end.

URL : Openness in Big Data and Data Repositories. The Application of an Ethics Framework for Big Data in Healthand Research

DOI : https://doi.org/10.1007/s41649-019-00097-z

A survey of researchers’ needs and priorities for data sharing

Authors : Iain Hrynaszkiewicz, James Harney, Lauren Cadwallader

PLOS has long supported Open Science. One of the ways in which we do so is via our stringent data availability policy established in 2014. Despite this policy, and more data sharing policies being introduced by other organizations, best practices for data sharing are adopted by a minority of researchers in their publications. Problems with effective research data sharing persist and these problems have been quantified by previous research as a lack of time, resources, incentives, and/or skills to share data.

In this study we built on this research by investigating the importance of tasks associated with data sharing, and researchers’ satisfaction with their ability to complete these tasks. By investigating these factors we aimed to better understand opportunities for new or improved solutions for sharing data.

In May-June 2020 we surveyed researchers from Europe and North America to rate tasks associated with data sharing on (i) their importance and (ii) their satisfaction with their ability to complete them. We received 728 completed and 667 partial responses. We calculated mean importance and satisfaction scores to highlight potential opportunities for new solutions to and compare different cohorts.

Tasks relating to research impact, funder compliance, and credit had the highest importance scores. 52% of respondents reuse research data but the average satisfaction score for obtaining data for reuse was relatively low. Tasks associated with sharing data were rated somewhat important and respondents were reasonably well satisfied in their ability to accomplish them. Notably, this included tasks associated with best data sharing practice, such as use of data repositories. However, the most common method for sharing data was in fact via supplemental files with articles, which is not considered to be best practice.

We presume that researchers are unlikely to seek new solutions to a problem or task that they are satisfied in their ability to accomplish, even if many do not attempt this task. This implies there are few opportunities for new solutions or tools to meet these researcher needs. Publishers can likely meet these needs for data sharing by working to seamlessly integrate existing solutions that reduce the effort or behaviour change involved in some tasks, and focusing on advocacy and education around the benefits of sharing data.

There may however be opportunities – unmet researcher needs – in relation to better supporting data reuse, which could be met in part by strengthening data sharing policies of journals and publishers, and improving the discoverability of data associated with published articles.

DOI : https://doi.org/10.31219/osf.io/njr5u

Repository Approaches to Improving the Quality of Shared Data and Code

Authors : Ana Trisovic, Katherine Mika, Ceilyn Boyd, Sebastian Feger, Mercè Crosas

Sharing data and code for reuse has become increasingly important in scientific work over the past decade. However, in practice, shared data and code may be unusable, or published results obtained from them may be irreproducible.

Data repository features and services contribute significantly to the quality, longevity, and reusability of datasets.

This paper presents a combination of original and secondary data analysis studies focusing on computational reproducibility, data curation, and gamified design elements that can be employed to indicate and improve the quality of shared data and code.

The findings of these studies are sorted into three approaches that can be valuable to data repositories, archives, and other research dissemination platforms.

URL : Repository Approaches to Improving the Quality of Shared Data and Code

DOI : https://doi.org/10.3390/data6020015

An overview of biomedical platforms for managing research data

Authors : Vivek Navale, Denis von Kaeppler, Matthew McAuliffe

Biomedical platforms provide the hardware and software to securely ingest, process, validate, curate, store, and share data. Many large-scale biomedical platforms use secure cloud computing technology for analyzing, integrating, and storing phenotypic, clinical, and genomic data. Several web-based platforms are available for researchers to access services and tools for biomedical research.

The use of bio-containers can facilitate the integration of bioinformatics software with various data analysis pipelines. Adoption of Common Data Models, Common Data Elements, and Ontologies can increase the likelihood of data reuse. Managing biomedical Big Data will require the development of strategies that can efficiently leverage public cloud computing resources.

The use of the research community developed standards for data collection can foster the development of machine learning methods for data processing and analysis. Increasingly platforms will need to support the integration of data from multiple disease area research.

URL : An overview of biomedical platforms for managing research data

DOI : https://doi.org/10.1007/s42488-020-00040-0

Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

Authors : Valentin Danchev, Yan Min, John Borghi, Mike Baiocchi, John P. A. Ioann

Importance

The benefits of responsible sharing of individual-participant data (IPD) from clinical studies are well recognized, but stakeholders often disagree on how to align those benefits with privacy risks, costs, and incentives for clinical trialists and sponsors.

The International Committee of Medical Journal Editors (ICMJE) required a data sharing statement (DSS) from submissions reporting clinical trials effective July 1, 2018. The required DSSs provide a window into current data sharing rates, practices, and norms among trialists and sponsors.

Objective

To evaluate the implementation of the ICMJE DSS requirement in 3 leading medical journals: JAMA, Lancet, and New England Journal of Medicine (NEJM).

Design, Setting, and Participants

This is a cross-sectional study of clinical trial reports published as articles in JAMA, Lancet, and NEJM between July 1, 2018, and April 4, 2020. Articles not eligible for DSS, including observational studies and letters or correspondence, were excluded.

A MEDLINE/PubMed search identified 487 eligible clinical trials in JAMA (112 trials), Lancet (147 trials), and NEJM (228 trials). Two reviewers evaluated each of the 487 articles independently.

Exposure

Publication of clinical trial reports in an ICMJE medical journal requiring a DSS.

Main Outcomes and Measures

The primary outcomes of the study were declared data availability and actual data availability in repositories. Other captured outcomes were data type, access, and conditions and reasons for data availability or unavailability. Associations with funding sources were examined.

Results

A total of 334 of 487 articles (68.6%; 95% CI, 64%-73%) declared data sharing, with nonindustry NIH-funded trials exhibiting the highest rates of declared data sharing (89%; 95% CI, 80%-98%) and industry-funded trials the lowest (61%; 95% CI, 54%-68%).

However, only 2 IPD sets (0.6%; 95% CI, 0.0%-1.5%) were actually deidentified and publicly available as of April 10, 2020. The remaining were supposedly accessible via request to authors (143 of 334 articles [42.8%]), repository (89 of 334 articles [26.6%]), and company (78 of 334 articles [23.4%]).

Among the 89 articles declaring that IPD would be stored in repositories, only 17 (19.1%) deposited data, mostly because of embargo and regulatory approval. Embargo was set in 47.3% of data-sharing articles (158 of 334), and in half of them the period exceeded 1 year or was unspecified.

Conclusions and Relevance

Most trials published in JAMA, Lancet, and NEJM after the implementation of the ICMJE policy declared their intent to make clinical data available. However, a wide gap between declared and actual data sharing exists.

To improve transparency and data reuse, journals should promote the use of unique pointers to data set location and standardized choices for embargo periods and access requirements.

URL : Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

DOI :10.1001/jamanetworkopen.2020.33972

COVID‐19 and the generation of novel scientific knowledge: Evidence‐based decisions and data sharing

Authors : Lucie Perillat, Brian S. Baigrie

Rationale, aims and objectives

The COVID‐19 pandemic has impacted every facet of society, including medical research. This paper is the second part of a series of articles that explore the intricate relationship between the different challenges that have hindered biomedical research and the generation of novel scientific knowledge during the COVID‐19 pandemic.

In the first part of this series, we demonstrated that, in the context of COVID‐19, the scientific community has been faced with numerous challenges with respect to (1) finding and prioritizing relevant research questions and (2) choosing study designs that are appropriate for a time of emergency.

Methods

During the early stages of the pandemic, research conducted on hydroxychloroquine (HCQ) sparked several heated debates with respect to the scientific methods used and the quality of knowledge generated.

Research on HCQ is used as a case study in both papers. The authors explored biomedical databases, peer‐reviewed journals, pre‐print servers and media articles to identify relevant literature on HCQ and COVID‐19, and examined philosophical perspectives on medical research in the context of this pandemic and previous global health challenges.

Results

This second paper demonstrates that a lack of research prioritization and methodological rigour resulted in the generation of fleeting and inconsistent evidence that complicated the development of public health guidelines.

The reporting of scientific findings to the scientific community and general public highlighted the difficulty of finding a balance between accuracy and speed.

Conclusions

The COVID‐19 pandemic presented challenges in terms of (3) evaluating evidence for the purpose of making evidence‐based decisions and (4) sharing scientific findings with the rest of the scientific community.

This second paper demonstrates that the four challenges outlined in the first and second papers have often compounded each other and have contributed to slowing down the creation of novel scientific knowledge during the COVID‐19 pandemic.

DOI : https://doi.org/10.1111/jep.13548