Is preprint the future of science? A thirty year journey of online preprint services

Authors : Boya Xie, Zhihong Shen, Kuansan Wang

Preprint is a version of a scientific paper that is publicly distributed preceding formal peer review. Since the launch of arXiv in 1991, preprints have been increasingly distributed over the Internet as opposed to paper copies.

It allows open online access to disseminate the original research within a few days, often at a very low operating cost. This work overviews how preprint has been evolving and impacting the research community over the past thirty years alongside the growth of the Web.

In this work, we first report that the number of preprints has exponentially increased 63 times in 30 years, although it only accounts for 4% of research articles. Second, we quantify the benefits that preprints bring to authors: preprints reach an audience 14 months earlier on average and associate with five times more citations compared with a non-preprint counterpart. Last, to address the quality concern of preprints, we discover that 41% of preprints are ultimately published at a peer-reviewed destination, and the published venues are as influential as papers without a preprint version.

Additionally, we discuss the unprecedented role of preprints in communicating the latest research data during recent public health emergencies. In conclusion, we provide quantitative evidence to unveil the positive impact of preprints on individual researchers and the community.

Preprints make scholarly communication more efficient by disseminating scientific discoveries more rapidly and widely with the aid of Web technologies. The measurements we present in this study can help researchers and policymakers make informed decisions about how to effectively use and responsibly embrace a preprint culture.

URL : https://arxiv.org/abs/2102.09066

Linguistic Analysis of the bioRxiv Preprint Landscape

Authors : David N. Nicholson, Vincent Rubinetti, Dongbo Hu, Marvin Thielk, Lawrence E. Hunter, Casey S. Greene

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online.

A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents.

The most prevalent features that changed appear to be associated with typesetting and mentions of supplementary sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model.

We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint.

We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish.

Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.

DOI : https://doi.org/10.1101/2021.03.04.433874

Publication practices during the COVID-19 pandemic: Biomedical preprints and peer-reviewed literature

Authors : Yulia V. Sevryugina, Andrew J. Dicks

The coronavirus pandemic introduced many changes to our society, and deeply affected the established in biomedical sciences publication practices. In this article, we present a comprehensive study of the changes in scholarly publication landscape for biomedical sciences during the COVID-19 pandemic, with special emphasis on preprints posted on bioRxiv and medRxiv servers.

We observe the emergence of a new category of preprint authors working in the fields of immunology, microbiology, infectious diseases, and epidemiology, who extensively used preprint platforms during the pandemic for sharing their immediate findings. The majority of these findings were works-in-progress unfitting for a prompt acceptance by refereed journals.

The COVID-19 preprints that became peer-reviewed journal articles were often submitted to journals concurrently with the posting on a preprint server, and the entire publication cycle, from preprint to the online journal article, took on average 63 days. This included an expedited peer-review process of 43 days and journal’s production stage of 15 days, however there was a wide variation in publication delays between journals. Only one third of COVID-19 preprints posted during the first nine months of the pandemic appeared as peer-reviewed journal articles.

These journal articles display high Altmetric Attention Scores further emphasizing a significance of COVID-19 research during 2020. This article will be relevant to editors, publishers, open science enthusiasts, and anyone interested in changes that the 2020 crisis transpired to publication practices and a culture of preprints in life sciences.

DOI : https://doi.org/10.1101/2021.01.21.427563

Quantifying and contextualizing the impact of bioRxiv preprints through automated social media audience segmentation

Authors : Jedidiah Carlson, Kelley Harris

Engagement with scientific manuscripts is frequently facilitated by Twitter and other social media platforms. As such, the demographics of a paper’s social media audience provide a wealth of information about how scholarly research is transmitted, consumed, and interpreted by online communities.

By paying attention to public perceptions of their publications, scientists can learn whether their research is stimulating positive scholarly and public thought. They can also become aware of potentially negative patterns of interest from groups that misinterpret their work in harmful ways, either willfully or unintentionally, and devise strategies for altering their messaging to mitigate these impacts.

In this study, we collected 331,696 Twitter posts referencing 1,800 highly tweeted bioRxiv preprints and leveraged topic modeling to infer the characteristics of various communities engaging with each preprint on Twitter.

We agnostically learned the characteristics of these audience sectors from keywords each user’s followers provide in their Twitter biographies. We estimate that 96% of the preprints analyzed are dominated by academic audiences on Twitter, suggesting that social media attention does not always correspond to greater public exposure.

We further demonstrate how our audience segmentation method can quantify the level of interest from nonspecialist audience sectors such as mental health advocates, dog lovers, video game developers, vegans, bitcoin investors, conspiracy theorists, journalists, religious groups, and political constituencies.

Surprisingly, we also found that 10% of the preprints analyzed have sizable (>5%) audience sectors that are associated with right-wing white nationalist communities. Although none of these preprints appear to intentionally espouse any right-wing extremist messages, cases exist in which extremist appropriation comprises more than 50% of the tweets referencing a given preprint.

These results present unique opportunities for improving and contextualizing the public discourse surrounding scientific research.

URL : Quantifying and contextualizing the impact of bioRxiv preprints through automated social media audience segmentation

DOI : https://doi.org/10.1371/journal.pbio.3000860

International authorship and collaboration across bioRxiv preprints

Authors : Richard J Abdill, Elizabeth M Adamowicz, Ran Blekhman

Preprints are becoming well established in the life sciences, but relatively little is known about the demographics of the researchers who post preprints and those who do not, or about the collaborations between preprint authors.

Here, based on an analysis of 67,885 preprints posted on bioRxiv, we find that some countries, notably the United States and the United Kingdom, are overrepresented on bioRxiv relative to their overall scientific output, while other countries (including China, Russia, and Turkey) show lower levels of bioRxiv adoption.

We also describe a set of ‘contributor countries’ (including Uganda, Croatia and Thailand): researchers from these countries appear almost exclusively as non-senior authors on international collaborations.

Lastly, we find multiple journals that publish a disproportionate number of preprints from some countries, a dynamic that almost always benefits manuscripts from the US.

URL :  International authorship and collaboration across bioRxiv preprints

DOI : https://doi.org/10.7554/eLife.58496

The relationship between bioRxiv preprints, citations and altmetrics

Authors : Nicholas Fraser, Fakhri Momeni, Philipp Mayr, Isabella Peters

A potential motivation for scientists to deposit their scientific work as preprints is to enhance its citation or social impact. In this study we assessed the citation and altmetric advantage of bioRxiv, a preprint server for the biological sciences.

We retrieved metadata of all bioRxiv preprints deposited between November 2013 and December 2017, and matched them to articles that were subsequently published in peer-reviewed journals.

Citation data from Scopus and altmetric data from Altmetric.com were used to compare citation and online sharing behavior of bioRxiv preprints, their related journal articles, and nondeposited articles published in the same journals. We found that bioRxiv-deposited journal articles had sizably higher citation and altmetric counts compared to nondeposited articles.

Regression analysis reveals that this advantage is not explained by multiple explanatory variables related to the articles’ publication venues and authorship. Further research will be required to establish whether such an effect is causal in nature.

bioRxiv preprints themselves are being directly cited in journal articles, regardless of whether the preprint has subsequently been published in a journal. bioRxiv preprints are also shared widely on Twitter and in blogs, but remain relatively scarce in mainstream media and Wikipedia articles, in comparison to peer-reviewed journal articles.

DOI : https://doi.org/10.1162/qss_a_00043

Open Access and Altmetrics in the pandemic age: Forescast analysis on COVID-19 literature

Authors : Daniel Torres-Salinas, Nicolas Robinson-Garcia, Pedro A. Castillo-Valdivieso

We present an analysis on the uptake of open access on COVID-19 related literature as well as the social media attention they gather when compared with non OA papers.

We use a dataset of publications curated by Dimensions and analyze articles and preprints. Our sample includes 11,686 publications of which 67.5% are openly accessible.

OA publications tend to receive the largest share of social media attention as measured by the Altmetric Attention Score. 37.6% of OA publications are bronze, which means toll journals are providing free access.

MedRxiv contributes to 36.3% of documents in repositories but papers in BiorXiv exhibit on average higher AAS. We predict the growth of COVID-19 literature in the following 30 days estimating ARIMA models for the overall publications set, OA vs. non OA and by location of the document (repository vs. journal).

We estimate that COVID-19 publications will double in the next 20 days, but non OA publications will grow at a higher rate than OA publications. We conclude by discussing the implications of such findings on the dissemination and communication of research findings to mitigate the coronavirus outbreak.

DOI : https://doi.org/10.1101/2020.04.23.057307