PreprintMatch: A tool for preprint to publication detection shows global inequities in scientific publication

Authors : Peter Eckmann, Anita Bandrowski

Preprints, versions of scientific manuscripts that precede peer review, are growing in popularity. They offer an opportunity to democratize and accelerate research, as they have no publication costs or a lengthy peer review process. Preprints are often later published in peer-reviewed venues, but these publications and the original preprints are frequently not linked in any way.

To this end, we developed a tool, PreprintMatch, to find matches between preprints and their corresponding published papers, if they exist. This tool outperforms existing techniques to match preprints and papers, both on matching performance and speed. PreprintMatch was applied to search for matches between preprints (from bioRxiv and medRxiv), and PubMed.

The preliminary nature of preprints offers a unique perspective into scientific projects at a relatively early stage, and with better matching between preprint and paper, we explored questions related to research inequity.

We found that preprints from low income countries are published as peer-reviewed papers at a lower rate than high income countries (39.6% and 61.1%, respectively), and our data is consistent with previous work that cite a lack of resources, lack of stability, and policy choices to explain this discrepancy.

Preprints from low income countries were also found to be published quicker (178 vs 203 days) and with less title, abstract, and author similarity to the published version compared to high income countries. Low income countries add more authors from the preprint to the published version than high income countries (0.42 authors vs 0.32, respectively), a practice that is significantly more frequent in China compared to similar countries.

Finally, we find that some publishers publish work with authors from lower income countries more frequently than others.

URL : PreprintMatch: A tool for preprint to publication detection shows global inequities in scientific publication

DOI : https://doi.org/10.1371/journal.pone.0281659

Comparison of Study Results Reported in medRxiv Preprints vs Peer-reviewed Journal Articles

Authors : Guneet Janda, Vishal Khetpal, Xiaoting Shi, Joseph S. Ross, Joshua D. Wallach

Question

What is the concordance among sample size, primary end points, results for primary end points, and interpretations described in preprints of clinical studies posted on medRxiv that are subsequently published in peer-reviewed journals (preprint-journal article pairs)?

Findings

In this cross-sectional study of 547 clinical studies that were initially posted to medRxiv and later published in peer-reviewed journals, 86.4% of preprint-journal article pairs were concordant in terms of sample size, 97.6% in terms of primary end points, 81.1% in terms of results of primary end points, and 96.2% in terms of study interpretations.

Meaning

This study suggests that most clinical studies posted as preprints on medRxiv and subsequently published in peer-reviewed journals had concordant study characteristics, results, and final interpretations.

URL : Comparison of Clinical Study Results Reported in medRxiv Preprints vs Peer-reviewed Journal Articles

Original location : https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2799350

Forecasting the publication and citation outcomes of COVID-19 preprints

Authors : Michael Gordon, Michael Bishop, Yiling Chen, Anna Dreber, Brandon Goldfedder, Felix Holzmeister, Magnus Johannesson, Yang Liu, Louisa Tran, Charles Twardy, Juntao Wang, Thomas Pfeiffer

Many publications on COVID-19 were released on preprint servers such as medRxiv and bioRxiv. It is unknown how reliable these preprints are, and which ones will eventually be published in scientific journals.

In this study, we use crowdsourced human forecasts to predict publication outcomes and future citation counts for a sample of 400 preprints with high Altmetric score. Most of these preprints were published within 1 year of upload on a preprint server (70%), with a considerable fraction (45%) appearing in a high-impact journal with a journal impact factor of at least 10.

On average, the preprints received 162 citations within the first year. We found that forecasters can predict if preprints will be published after 1 year and if the publishing journal has high impact. Forecasts are also informative with respect to Google Scholar citations within 1 year of upload on a preprint server.

For both types of assessment, we found statistically significant positive correlations between forecasts and observed outcomes. While the forecasts can help to provide a preliminary assessment of preprints at a faster pace than traditional peer-review, it remains to be investigated if such an assessment is suited to identify methodological problems in preprints.

URL : Forecasting the publication and citation outcomes of COVID-19 preprints

DOI : https://doi.org/10.1098/rsos.220440

Publication practices during the COVID-19 pandemic: Expedited publishing or simply an early bird effect?

Authors : Yulia V. Sevryugina, Andrew J. Dicks

This study explores the evolution of publication practices associated with the SARS-CoV-2 research papers, namely, peer-reviewed journal and review articles indexed in PubMed and their associated preprints posted on bioRxiv and medRxiv servers: a total of 4,031 journal article-preprint pairs.

Our assessment of various publication delays during the January 2020 to March 2021 period revealed the early bird effect that lies beyond the involvement of any publisher policy action and is directly linked to the emerging nature of new and ‘hot’ scientific topics.

We found that when the early bird effect and data incompleteness are taken into account, COVID-19 related research papers show only a moderately expedited speed of dissemination as compared with the pre-pandemic era.

Medians for peer-review and production stage delays were 66 and 15 days, respectively, and the entire conversion process from a preprint to its peer-reviewed journal article version took 109.5 days.

The early bird effect produced an ephemeral perception of a global rush in scientific publishing during the early days of the coronavirus pandemic. We emphasize the importance of considering the early bird effect in interpreting publication data collected at the outset of a newly emerging event.

URL : Publication practices during the COVID-19 pandemic: Expedited publishing or simply an early bird effect?

DOI : https://doi.org/10.1002/leap.1483

Examining linguistic shifts between preprints and publications

Authors : David N. Nicholson, Vincent Rubinetti, Dongbo Hu, Marvin Thielk, Lawrence E. Hunter, Casey S. Greene

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online.

A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents.

The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model.

We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint–peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint.

We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish.

Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.

URL : Examining linguistic shifts between preprints and publications

DOI : https://doi.org/10.1371/journal.pbio.3001470

Reproducibility of COVID-19 pre-prints

Authors : Annie Collins, Rohan Alexander

To examine the reproducibility of COVID-19 research, we create a dataset of pre-prints posted to arXiv, bioRxiv, medRxiv, and SocArXiv between 28 January 2020 and 30 June 2021 that are related to COVID-19.

We extract the text from these pre-prints and parse them looking for keyword markers signalling the availability of the data and code underpinning the pre-print. For the pre-prints that are in our sample, we are unable to find markers of either open data or open code for 75 per cent of those on arXiv, 67 per cent of those on bioRxiv, 79 per cent of those on medRxiv, and 85 per cent of those on SocArXiv.

We conclude that there may be value in having authors categorize the degree of openness of their pre-print as part of the pre-print submissions process, and more broadly, there is a need to better integrate open science training into a wide range of fields.

URL : https://arxiv.org/abs/2107.10724

The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape

Authors : Nicholas Fraser, Liam Brierley, Gautam Dey, Jessica K. Polka, Máté Pálfy, Federico Nann, Jonathon Alexis Coates

The world continues to face a life-threatening viral pandemic. The virus underlying the Coronavirus Disease 2019 (COVID-19), Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), has caused over 98 million confirmed cases and 2.2 million deaths since January 2020.

Although the most recent respiratory viral pandemic swept the globe only a decade ago, the way science operates and responds to current events has experienced a cultural shift in the interim.

The scientific community has responded rapidly to the COVID-19 pandemic, releasing over 125,000 COVID-19–related scientific articles within 10 months of the first confirmed case, of which more than 30,000 were hosted by preprint servers.

We focused our analysis on bioRxiv and medRxiv, 2 growing preprint servers for biomedical research, investigating the attributes of COVID-19 preprints, their access and usage rates, as well as characteristics of their propagation on online platforms.

Our data provide evidence for increased scientific and public engagement with preprints related to COVID-19 (COVID-19 preprints are accessed more, cited more, and shared more on various online platforms than non-COVID-19 preprints), as well as changes in the use of preprints by journalists and policymakers.

We also find evidence for changes in preprinting and publishing behaviour: COVID-19 preprints are shorter and reviewed faster.

Our results highlight the unprecedented role of preprints and preprint servers in the dissemination of COVID-19 science and the impact of the pandemic on the scientific communication landscape.

URL : The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape

DOI : https://doi.org/10.1371/journal.pbio.3000959