Préservation des données de recherche : proposer des services de soutien aux chercheurs du site Uni Arve de l’université de Genève

Auteur/Author : Manuela Bezzi

Ce travail porte sur les pratiques des chercheurs du site Uni Arve (faculté des sciences) de l’université de Genève concernant la préservation et la réutilisation des données de recherche, et son objectif est d’évaluer les besoins des chercheurs afin de leur proposer des services de soutien appropriés.

La préservation des données de recherche s’inscrit dans le mouvement de l’Open Data dont l’objectif est de rendre les données de recherche publiquement accessibles, intelligibles et réutilisables, en particulier lorsque ces données ont été produites grâce à des recherches financées par des fonds publics.

Pour ce faire, le FNS demande aux chercheurs de déposer leurs données dans des archives publiques répondant aux principes FAIR. Or, depuis juin 2019, l’université de Genève met à disposition de ses chercheurs une archive institutionnelle, Yareta, répondant aux critères du FNS.

Afin de répondre aux mieux aux besoins des chercheurs, une approche en deux temps a été adoptée : (1) une analyse des jeux de données déposés sur Yareta a permis d’identifier les problématiques faisant obstacle à la réutilisation des données. (2) Puis, des entretiens menés avec des chercheurs ont permis d’analyser leurs pratiques de préservation et leurs besoins.

Les informations récoltées par ces deux approches ont permis de faire les propositions suivantes: un guide d’archivage portant sur quatre activités permettant de garantir une bonne préservation : format, contexte, métadonnées, licence, la mise en place de ressources additionnelles (page web ou formation) couvrant des notions peu comprises par les chercheurs, la modification de pages web existantes pour des raisons de cohérence, l’ajout d’information dans l’outil Yareta.

Ces propositions sont des solutions concrètes, basées sur les ressources existantes de l’université de Genève afin de pouvoir être complémentaires aux services de soutien et aux ressources déjà proposés par l’université de Genève.

De plus, ces propositions pourront bénéficier à toute la communauté de l’université de Genève et pas uniquement aux chercheurs du site Uni Arve.

DOI : https://doc.rero.ch/record/329678

Opening Pandora’s Box: Peeking inside Psychology’s data sharing practices, and seven recommendations for change

Authors : John N. Towse, David A Ellis, Andrea S Towse

Open data-sharing is a valuable practice that ought to enhance the impact, reach, and transparency of a research project.

While widely advocated by many researchers and mandated by some journals and funding agencies, little is known about detailed practices across psychological science. In a pre-registered study, we show that overall, few research papers directly link to available data in many, though not all, journals.

Most importantly, even where open data can be identified, the majority of these lacked completeness and reusability—conclusions that closely mirror those reported outside of Psychology.

Exploring the reasons behind these findings, we offer seven specific recommendations for engineering and incentivizing improved practices, so that the potential of open data can be better realized across psychology and social science more generally.

URL : Opening Pandora’s Box: Peeking inside Psychology’s data sharing practices, and seven recommendations for change

DOI : https://doi.org/10.3758/s13428-020-01486-1

The views, perspectives, and experiences of academic researchers with data sharing and reuse: A meta-synthesis

Authors : Laure Perrier, Erik Blondal, Heather MacDonald

Background

Funding agencies and research journals are increasingly demanding that researchers share their data in public repositories. Despite these requirements, researchers still withhold data, refuse to share, and deposit data that lacks annotation.

We conducted a meta-synthesis to examine the views, perspectives, and experiences of academic researchers on data sharing and reuse of research data.

Methods

We searched the published and unpublished literature for studies on data sharing by researchers in academic institutions. Two independent reviewers screened citations and abstracts, then full-text articles.

Data abstraction was performed independently by two investigators. The abstracted data was read and reread in order to generate codes. Key concepts were identified and thematic analysis was used for data synthesis.

Results

We reviewed 2005 records and included 45 studies along with 3 companion reports. The studies were published between 2003 and 2018 and most were conducted in North America (60%) or Europe (17%).

The four major themes that emerged were data integrity, responsible conduct of research, feasibility of sharing data, and value of sharing data. Researchers lack time, resources, and skills to effectively share their data in public repositories.

Data quality is affected by this, along with subjective decisions around what is considered to be worth sharing. Deficits in infrastructure also impede the availability of research data. Incentives for sharing data are lacking.

Conclusion

Researchers lack skills to share data in a manner that is efficient and effective. Improved infrastructure support would allow them to make data available quickly and seamlessly. The lack of incentives for sharing research data with regards to academic appointment, promotion, recognition, and rewards need to be addressed.

URL : The views, perspectives, and experiences of academic researchers with data sharing and reuse: A meta-synthesis

DOI : https://doi.org/10.1371/journal.pone.0234275.s002

ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

Authors: Nico Riedel, Miriam Kip, Evgeny Bobro

Open research data are increasingly recognized as a quality indicator and an important resource to increase transparency, robustness and collaboration in science. However, no standardized way of reporting Open Data in publications exists, making it difficult to find shared datasets and assess the prevalence of Open Data in an automated fashion.

We developed ODDPub (Open Data Detection in Publications), a text-mining algorithm that screens biomedical publications and detects cases of Open Data. Using English-language original research publications from a single biomedical research institution (n = 8689) and randomly selected from PubMed (n = 1500) we iteratively developed a set of derived keyword categories.

ODDPub can detect data sharing through field-specific repositories, general-purpose repositories or the supplement. Additionally, it can detect shared analysis code (Open Code).

To validate ODDPub, we manually screened 792 publications randomly selected from PubMed. On this validation dataset, our algorithm detected Open Data publications with a sensitivity of 0.73 and specificity of 0.97.

Open Data was detected for 11.5% (n = 91) of publications. Open Code was detected for 1.4% (n = 11) of publications with a sensitivity of 0.73 and specificity of 1.00. We compared our results to the linked datasets found in the databases PubMed and Web of Science.

Our algorithm can automatically screen large numbers of publications for Open Data. It can thus be used to assess Open Data sharing rates on the level of subject areas, journals, or institutions. It can also identify individual Open Data publications in a larger publication corpus. ODDPub is published as an R package on GitHub.

URL : ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

DOI : http://doi.org/10.5334/dsj-2020-042

Scientific data management in the federal government: A case study of NOAA and responsibility for preserving digital data

Authors : Adam Kriesberg, Jacob Kowall

In this paper, we examine the ways in which the evolution of federal and agency‐specific data management policies has affected and continues to affect the long‐term preservation of digital scientific data produced by the United States government.

After reviewing the existing literature on the role of archival theory and practice in the preservation of scientific data, we present the case of the National Oceanic and Atmospheric Administration (NOAA) to analyze how data management activities at this agency are shaped by legislative mandates as well as both government‐wide and agency‐specific information‐management policies.

Through the connected network of law, federal policy, agency policy, and the records schedules which govern recordkeeping practice in the federal government, we propose a number of further questions on how government agencies can effectively provide for the management of scientific data as federal records.

DOI : https://doi.org/10.1002/pra2.266

Data sharing policies of journals in life, health, and physical sciences indexed in Journal Citation Reports

Authors : Jihyun Kim, Soon Kim, Hye-Min Cho, Jae Hwa Chang, Soo Young Kim

Background

Many scholarly journals have established their own data-related policies, which specify their enforcement of data sharing, the types of data to be submitted, and their procedures for making data available.

However, except for the journal impact factor and the subject area, the factors associated with the overall strength of the data sharing policies of scholarly journals remain unknown.

This study examines how factors, including impact factor, subject area, type of journal publisher, and geographical location of the publisher are related to the strength of the data sharing policy.

Methods

From each of the 178 categories of the Web of Science’s 2017 edition of Journal Citation Reports, the top journals in each quartile (Q1, Q2, Q3, and Q4) were selected in December 2018. Of the resulting 709 journals (5%), 700 in the fields of life, health, and physical sciences were selected for analysis.

Four of the authors independently reviewed the results of the journal website searches, categorized the journals’ data sharing policies, and extracted the characteristics of individual journals.

Univariable multinomial logistic regression analyses were initially conducted to determine whether there was a relationship between each factor and the strength of the data sharing policy.

Based on the univariable analyses, a multivariable model was performed to further investigate the factors related to the presence and/or strength of the policy.

Results

Of the 700 journals, 308 (44.0%) had no data sharing policy, 125 (17.9%) had a weak policy, and 267 (38.1%) had a strong policy (expecting or mandating data sharing). The impact factor quartile was positively associated with the strength of the data sharing policies.

Physical science journals were less likely to have a strong policy relative to a weak policy than Life science journals (relative risk ratio [RRR], 0.36; 95% CI [0.17–0.78]). Life science journals had a greater probability of having a weak policy relative to no policy than health science journals (RRR, 2.73; 95% CI [1.05–7.14]).

Commercial publishers were more likely to have a weak policy relative to no policy than non-commercial publishers (RRR, 7.87; 95% CI, [3.98–15.57]). Journals by publishers in Europe, including the majority of those located in the United Kingdom and the Netherlands, were more likely to have a strong data sharing policy than a weak policy (RRR, 2.99; 95% CI [1.85–4.81]).

Conclusions

These findings may account for the increase in commercial publishers’ engagement in data sharing and indicate that European national initiatives that encourage and mandate data sharing may influence the presence of a strong policy in the associated journals.

Future research needs to explore the factors associated with varied degrees in the strength of a data sharing policy as well as more diverse characteristics of journals related to the policy strength.

URL : Data sharing policies of journals in life, health, and physical sciences indexed in Journal Citation Reports

DOI : https://doi.org/10.7717/peerj.9924

Entrepôts de données de recherche : mesurer l’impact de l’Open Science à l’aune de la consultation des jeux de données déposés

Auteur/Author  : Violaine Rebouillat

Les décennies 2000 et 2010 ont vu se développer un nombre croissant de e-infrastructures de recherche, rendant plus aisés le partage et l’accès aux données scientifiques. Cette tendance s’est vue renforcée par l’essor de politiques d’ouverture des données, lesquelles ont donné lieu à une multiplication de réservoirs de données – aussi appelés « entrepôts de données ». Quantifier et qualifier l’utilisation des données rendues publiques constitue un élément essentiel pour évaluer l’impact des politiques d’ouverture des données.

Dans cet article, nous questionnons l’utilisation des données déposées dans les entrepôts. Dans quelle mesure ces données sont-elles consultées et téléchargées ?

L’article présente les premiers résultats d’une enquête quantitative auprès de 20 entrepôts. Il esquisse deux tendances, qui restent à ce stade propres à l’échantillon étudié, à savoir : (1) l’augmentation globale du nombre de consultations, de téléchargements et de données disponibles dans les entrepôts sur la période étudiée (2015-2020), et (2) la concentration des téléchargements sur une proportion relativement faible des données de l’entrepôt (de l’ordre de 10% à 30%).

URL : https://hal.archives-ouvertes.fr/hal-02928817/