Du traitement des données à la création de valeur : comprendre les pratiques professionnelles des réutilisateurs des données ouvertes

Auteurs/Authors : Valentyna Dymytrova, Françoise Paquienséguy

A partir d’une enquête de terrain menée en France en 2017, cet article identifie différentes formes de réutilisation des données ouvertes et analyse les chaînes de traitement sur lesquelles elles se fondent. En décryptant ces chaînes et les outils mobilisés par trois catégories de réutilisateurs professionnels (développeurs, data scientists et data journalists), les auteurs discutent leurs liens avec la chaîne de création de valeur.

Les pratiques et les attentes professionnelles y sont abordées, en termes de plus-value générée par les données, de modèle économique (le courtage informationnel) mais aussi de prestations de services innovants.

URL : https://hal.archives-ouvertes.fr/hal-02913346

Which aspects of the Open Science agenda are most relevant to scientometric research and publishing? An opinion paper

Authors : Lutz Bornmann, Raf Guns, Michael Thelwall, Dietmar Wolfram

Open Science is an umbrella term that encompasses many recommendations for possible changes in research practices, management, and publishing with the objective to increase transparency and accessibility.

This has become an important science policy issue that all disciplines should consider. Many Open Science recommendations may be valuable for the further development of research and publishing but not all are relevant to all fields.

This opinion paper considers the aspects of Open Science that are most relevant for scientometricians, discussing how they can be usefully applied.

DOI : https://doi.org/10.1162/qss_e_00121

Open Data Challenges in Climate Science

Authors : Francesca Eggleton, Kate Winfiel

The purpose of this paper is to explore challenges in open climate data experienced by data scientists at the Centre for Environmental Data Analysis (CEDA). This paper explores two of the five V’s of Big Data, Volume and Variety.

These challenges are explored using the Sentinel satellite data and Climate Modelling Intercomparison Project phase six (CMIP6) data held in the CEDA Archive. To address the Big Data Volume challenge, this paper describes the approach developed by CEDA to manage large volumes of data through the allocation of storage as filesets.

These filesets allow CEDA to plan and track dataset storage volumes, a flexible approach which could be adopted by any data centre. CEDA utilise the implementation of the Climate and Forecast (CF) conventions and standard names within archived data wherever possible to overcome the challenge of Variety.

Collaboration from the international science community through contributions to the moderation of CF standard names ensures these data then adhere to the FAIR (Findable, Accessible, Interoperable and Reusable) data principles.

Utilising data standards such as the CF standard names is recommended because it promotes data exchange and allows data from different sources to be compared. Addressing these Open Data challenges is crucial to ensure valuable climate data are made available to the scientific community to facilitate research that addresses one of society’s most pressing issues – climate change.

URL : Open Data Challenges in Climate Science

DOI : http://doi.org/10.5334/dsj-2020-052

Enforcing public data archiving policies in academic publishing: A study of ecology journals

Authors : Dan Sholler, Karthik Ram, Carl Boettiger, Daniel S Katz

To improve the quality and efficiency of research, groups within the scientific community seek to exploit the value of data sharing. Funders, institutions, and specialist organizations are developing and implementing strategies to encourage or mandate data sharing within and across disciplines, with varying degrees of success.

Academic journals in ecology and evolution have adopted several types of public data archiving policies requiring authors to make data underlying scholarly manuscripts freely available. The effort to increase data sharing in the sciences is one part of a broader “data revolution” that has prompted discussion about a paradigm shift in scientific research.

Yet anecdotes from the community and studies evaluating data availability suggest that these policies have not obtained the desired effects, both in terms of quantity and quality of available datasets.

We conducted a qualitative, interview-based study with journal editorial staff and other stakeholders in the academic publishing process to examine how journals enforce data archiving policies.

We specifically sought to establish who editors and other stakeholders perceive as responsible for ensuring data completeness and quality in the peer review process. Our analysis revealed little consensus with regard to how data archiving policies should be enforced and who should hold authors accountable for dataset submissions.

Themes in interviewee responses included hopefulness that reviewers would take the initiative to review datasets and trust in authors to ensure the completeness and quality of their datasets.

We highlight problematic aspects of these thematic responses and offer potential starting points for improvement of the public data archiving process.

URL : Enforcing public data archiving policies in academic publishing: A study of ecology journals

DOI : https://doi.org/10.1177/2053951719836258

Open Data for Sustainable Development on a Knowledge-Based Economy: The Case of Botswana

Authors: Oarabile Sebubi, Irina Zlotnikova, Hlomani Hlomani

A review of sustainable economic development perspectives reveals a lack of data-driven approaches that meet the needs of knowledge-based economies.

This paper presents a conceptual design artefact, a theoretical framework that maps the open data pathway toward the achievement of a knowledge-based economy and sustainable economic development with a specific reference to Botswana.

The proposed framework models the transition from open data to open knowledge. It further establishes the potential impact of that transition on the realisation of a knowledge-based economy, sustainable economic development, and the attainment of a knowledge society.

The method adopted in the development of the framework involves three processes: 1) review of literature on key research concepts; 2) identification of relationships between research concepts; and 3) design and development of the proposed open data framework.

The proposed framework will serve as a point of reference in open data-driven economic transitions and transformations in Botswana. This design artefact can be customised to meet the economic needs of other developing countries.

URL : Open Data for Sustainable Development on a Knowledge-Based Economy: The Case of Botswana

DOI : http://doi.org/10.5334/dsj-2020-044

ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

Authors: Nico Riedel, Miriam Kip, Evgeny Bobro

Open research data are increasingly recognized as a quality indicator and an important resource to increase transparency, robustness and collaboration in science. However, no standardized way of reporting Open Data in publications exists, making it difficult to find shared datasets and assess the prevalence of Open Data in an automated fashion.

We developed ODDPub (Open Data Detection in Publications), a text-mining algorithm that screens biomedical publications and detects cases of Open Data. Using English-language original research publications from a single biomedical research institution (n = 8689) and randomly selected from PubMed (n = 1500) we iteratively developed a set of derived keyword categories.

ODDPub can detect data sharing through field-specific repositories, general-purpose repositories or the supplement. Additionally, it can detect shared analysis code (Open Code).

To validate ODDPub, we manually screened 792 publications randomly selected from PubMed. On this validation dataset, our algorithm detected Open Data publications with a sensitivity of 0.73 and specificity of 0.97.

Open Data was detected for 11.5% (n = 91) of publications. Open Code was detected for 1.4% (n = 11) of publications with a sensitivity of 0.73 and specificity of 1.00. We compared our results to the linked datasets found in the databases PubMed and Web of Science.

Our algorithm can automatically screen large numbers of publications for Open Data. It can thus be used to assess Open Data sharing rates on the level of subject areas, journals, or institutions. It can also identify individual Open Data publications in a larger publication corpus. ODDPub is published as an R package on GitHub.

URL : ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

DOI : http://doi.org/10.5334/dsj-2020-042

Open Data and Open Access Articles: Exploring Connections in the Life Sciences

Author : Sarah C. Williams

Objectives

This small-scale study explores the current state of connections between open data and open access (OA) articles in the life sciences.

Methods

This study involved 44 openly available life sciences datasets from the Illinois Data Bank that had 45 related research articles. For each article, I gathered the OA status of the journal and the article on the publisher website and checked whether the article was openly available via Unpaywall and Research Gate. I also examined how and where the open data was included in the HTML and PDF versions of the related articles.

Results

Of the 45 articles studied, less than half were published in Gold/Full OA journals, and while the remaining articles were published in Gold/Hybrid journals, none of them were OA. This study found that OA articles pointed to the Illinois Data Bank datasets similarly to all of the related articles, most commonly with a data availability statement containing a DOI.

Conclusions

The findings indicate that Gold OA in hybrid journals does not appear to be a popular option, even for articles connected to open data, and this study emphasizes the importance of data repositories providing DOIs, since the related articles frequently used DOIs to point to the Illinois Data Bank datasets. This study also revealed concerns about free (not licensed OA) access to articles on publisher websites, which will be a significant topic for future research.

URL : Open Data and Open Access Articles: Exploring Connections in the Life Sciences

DOI : https://escholarship.umassmed.edu/jeslib/vol9/iss1/3/