Linguistic Analysis of the bioRxiv Preprint Landscape

Authors : David N. Nicholson, Vincent Rubinetti, Dongbo Hu, Marvin Thielk, Lawrence E. Hunter, Casey S. Greene

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online.

A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents.

The most prevalent features that changed appear to be associated with typesetting and mentions of supplementary sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model.

We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint.

We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish.

Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.

DOI : https://doi.org/10.1101/2021.03.04.433874

Publication practices during the COVID-19 pandemic: Biomedical preprints and peer-reviewed literature

Authors : Yulia V. Sevryugina, Andrew J. Dicks

The coronavirus pandemic introduced many changes to our society, and deeply affected the established in biomedical sciences publication practices. In this article, we present a comprehensive study of the changes in scholarly publication landscape for biomedical sciences during the COVID-19 pandemic, with special emphasis on preprints posted on bioRxiv and medRxiv servers.

We observe the emergence of a new category of preprint authors working in the fields of immunology, microbiology, infectious diseases, and epidemiology, who extensively used preprint platforms during the pandemic for sharing their immediate findings. The majority of these findings were works-in-progress unfitting for a prompt acceptance by refereed journals.

The COVID-19 preprints that became peer-reviewed journal articles were often submitted to journals concurrently with the posting on a preprint server, and the entire publication cycle, from preprint to the online journal article, took on average 63 days. This included an expedited peer-review process of 43 days and journal’s production stage of 15 days, however there was a wide variation in publication delays between journals. Only one third of COVID-19 preprints posted during the first nine months of the pandemic appeared as peer-reviewed journal articles.

These journal articles display high Altmetric Attention Scores further emphasizing a significance of COVID-19 research during 2020. This article will be relevant to editors, publishers, open science enthusiasts, and anyone interested in changes that the 2020 crisis transpired to publication practices and a culture of preprints in life sciences.

DOI : https://doi.org/10.1101/2021.01.21.427563

Communicating Scientific Uncertainty in an Age of COVID-19: An Investigation into the Use of Preprints by Digital Media Outlets

Authors : Alice Fleerackers, Michelle Riedlinger, Laura Moorhead, Rukhsana Ahmed, Juan Pablo Alperin

In this article, we investigate the surge in use of COVID-19-related preprints by media outlets. Journalists are a main source of reliable public health information during crises and, until recently, journalists have been reluctant to cover preprints because of the associated scientific uncertainty.

Yet, uploads of COVID-19 preprints and their uptake by online media have outstripped that of preprints about any other topic. Using an innovative approach combining altmetrics methods with content analysis, we identified a diversity of outlets covering COVID-19-related preprints during the early months of the pandemic, including specialist medical news outlets, traditional news media outlets, and aggregators.

We found a ubiquity of hyperlinks as citations and a multiplicity of framing devices for highlighting the scientific uncertainty associated with COVID-19 preprints. These devices were rarely used consistently (e.g., mentioning that the study was a preprint, unreviewed, preliminary, and/or in need of verification).

About half of the stories we analyzed contained framing devices emphasizing uncertainty. Outlets in our sample were much less likely to identify the research they mentioned as preprint research, compared to identifying it as simply “research.” This work has significant implications for public health communication within the changing media landscape.

While current best practices in public health risk communication promote identifying and promoting trustworthy sources of information, the uptake of preprint research by online media presents new challenges.

At the same time, it provides new opportunities for fostering greater awareness of the scientific uncertainty associated with health research findings.

DOI : https://doi.org/10.1080/10410236.2020.1864892

Du traitement des données à la création de valeur : comprendre les pratiques professionnelles des réutilisateurs des données ouvertes

Auteurs/Authors : Valentyna Dymytrova, Françoise Paquienséguy

A partir d’une enquête de terrain menée en France en 2017, cet article identifie différentes formes de réutilisation des données ouvertes et analyse les chaînes de traitement sur lesquelles elles se fondent. En décryptant ces chaînes et les outils mobilisés par trois catégories de réutilisateurs professionnels (développeurs, data scientists et data journalists), les auteurs discutent leurs liens avec la chaîne de création de valeur.

Les pratiques et les attentes professionnelles y sont abordées, en termes de plus-value générée par les données, de modèle économique (le courtage informationnel) mais aussi de prestations de services innovants.

URL : https://hal.archives-ouvertes.fr/hal-02913346

Bibliodiversity at the Centre: Decolonizing Open Access

Author : Monica Berger

The promise of open access for the global South has not been fully met. Publishing is dominated by Northern publishers, which disadvantages Southern authors through platform capitalism and open access models requiring article processing charges to publish.

This article argues that through the employment of bibliodiversity — a sustainable, anticolonial ethos and practice developed in Latin America — the South can reclaim and decolonize open access and nurture scholarly communities.

Self‐determination and locality are at the core of bibliodiversity which rejects the domination of international, English‐language journal publishing. As articulated by the Jussieu Call, wide‐ranging, scholarly‐community‐based, non‐profit and sustainable models for open access are integral to bibliodiversity, as is reform of research evaluation systems.

Predatory publishing exploits open access and perpetuates the marginalization of Southern scholars. Predatory journals are often also conflated with legitimate Southern journals. The article concludes with a discussion of Southern open access initiatives, highlighting large‐scale infrastructure in Latin America and library‐based publishing in Africa, which express the true spirit of open access as a commons for knowledge as a public good.

DOI : https://doi.org/10.1111/dech.12634

Citation needed? Wikipedia and the COVID-19 pandemic

Authors : Omer Benjakob, Rona Aviram, Jonathan Sobel

With the COVID-19 pandemic’s outbreak at the beginning of 2020, millions across the world flocked to Wikipedia to read about the virus. Our study offers an in-depth analysis of the scientific backbone supporting Wikipedia’s COVID-19 articles.

Using references as a readout, we asked which sources informed Wikipedia’s growing pool of COVID-19-related articles during the pandemic’s first wave (January-May 2020). We found that coronavirus-related articles referenced trusted media sources and cited high-quality academic research.

Moreover, despite a surge in preprints, Wikipedia’s COVID-19 articles had a clear preference for open-access studies published in respected journals and made little use of non-peer-reviewed research up-loaded independently to academic servers.

Building a timeline of COVID-19 articles on Wikipedia from 2001-2020 revealed a nuanced trade-off between quality and timeliness, with a growth in COVID-19 article creation and citations, from both academic research and popular media.

It further revealed how preexisting articles on key topics related to the virus created a frame-work on Wikipedia for integrating new knowledge. This “scientific infrastructure” helped provide context, and regulated the influx of new information into Wikipedia.

Lastly, we constructed a network of DOI-Wikipedia articles, which showed the landscape of pandemic-related knowledge on Wikipedia and revealed how citations create a web of scientific knowledge to support coverage of scientific topics like COVID-19 vaccine development.

Understanding how scientific research interacts with the digital knowledge-sphere during the pandemic provides insight into how Wikipedia can facilitate access to science. It also sheds light on how Wikipedia successfully fended of disinformation on the COVID-19 and may provide insight into how its unique model may be deployed in other contexts.

PLAN S and other progress for Open Access to knowledge

Authors : Stefano Bianco, Laura Patrizii

The principle of Open Access (OA) is about the breaking of any paywall to the knowledge coming from research funded by public monies. After twenty years of statements not much has changed and the market of scientific journals is still in the hands of oligopolistic companies.

Plan S is a disruptive initiative created by research funders in Europe and US which aims to foster the transition to Open Access by acting against hybrid journals and citation index.

The Italian Institute for Nuclear Physics (INFN) has signed Plan S and, in close relationship with the Universities, the Conference of Rectors (CRUI), and the National Research Council (CNR), is outreaching the academic communities to discuss strengths, weaknesses, opportunities and threats. In this work both a description of Plan S and a brief status report of other initiatives are given.

URL : PLAN S and other progress for Open Access to knowledge

DOI : http://dx.doi.org/10.2423/i22394303v10Sp59