Linguistic Analysis of the bioRxiv Preprint Landscape

Authors : David N. Nicholson, Vincent Rubinetti, Dongbo Hu, Marvin Thielk, Lawrence E. Hunter, Casey S. Greene

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online.

A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents.

The most prevalent features that changed appear to be associated with typesetting and mentions of supplementary sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model.

We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint.

We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish.

Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.

DOI : https://doi.org/10.1101/2021.03.04.433874

Publication practices during the COVID-19 pandemic: Biomedical preprints and peer-reviewed literature

Authors : Yulia V. Sevryugina, Andrew J. Dicks

The coronavirus pandemic introduced many changes to our society, and deeply affected the established in biomedical sciences publication practices. In this article, we present a comprehensive study of the changes in scholarly publication landscape for biomedical sciences during the COVID-19 pandemic, with special emphasis on preprints posted on bioRxiv and medRxiv servers.

We observe the emergence of a new category of preprint authors working in the fields of immunology, microbiology, infectious diseases, and epidemiology, who extensively used preprint platforms during the pandemic for sharing their immediate findings. The majority of these findings were works-in-progress unfitting for a prompt acceptance by refereed journals.

The COVID-19 preprints that became peer-reviewed journal articles were often submitted to journals concurrently with the posting on a preprint server, and the entire publication cycle, from preprint to the online journal article, took on average 63 days. This included an expedited peer-review process of 43 days and journal’s production stage of 15 days, however there was a wide variation in publication delays between journals. Only one third of COVID-19 preprints posted during the first nine months of the pandemic appeared as peer-reviewed journal articles.

These journal articles display high Altmetric Attention Scores further emphasizing a significance of COVID-19 research during 2020. This article will be relevant to editors, publishers, open science enthusiasts, and anyone interested in changes that the 2020 crisis transpired to publication practices and a culture of preprints in life sciences.

DOI : https://doi.org/10.1101/2021.01.21.427563

Être chercheur, devenir expert ? L’économie morale du rapport à l’expertise dans un laboratoire de toxicologie

Auteur/Author : David Demortain

Le rapport entre système de recherche et action publique s’est institutionnalisé ces dernières années, à travers un entrelacs de comités d’expertise, de groupes de travail ou de conseils scientifiques, souvent supervisés par des agences gouvernementales, qui permettent la mobilisation systématique de chercheurs pour la sécurité sanitaire.

Le système d’expertise ne peut toutefois collecter l’ensemble des connaissances scientifiques produites par les chercheurs, ne serait-ce que parce qu’une partie de cette profession considère que l’expertise ne fait pas partie de son métier.

Cet article cherche à comprendre comment et dans quelle mesure les chercheurs deviennent experts, à partir d’une analyse des activités des chercheurs d’un laboratoire de toxicologie, et des motifs et modalités variées d’engagement dans l’expertise parmi ceux-ci.

Il dégage trois économies morales distinctes du rapport à l’expertise, pour montrer que l’engagement dans l’expertise est lié à différentes manières de définir et valoriser le travail de recherche toxicologique.

DOI : https://doi.org/10.4000/rac.19302

Preprints in motion: tracking changes between posting and journal publication

Authors : Jessica K Polka, Gautam Dey, Máté Pálfy, Federico Nanni, Liam Brierley, Nicholas Fraser, Jonathon Alexis Coates

Amidst the COVID-19 pandemic, preprints in the biomedical sciences are being posted and accessed at unprecedented rates, drawing widespread attention from the general public, press and policymakers for the first time.

This phenomenon has sharpened longstanding questions about the reliability of information shared prior to journal peer review. Does the information shared in preprints typically withstand the scrutiny of peer review, or are conclusions likely to change in the version of record?

We assessed preprints that had been posted and subsequently published in a journal between 1st January and 30th April 2020, representing the initial phase of the pandemic response. We utilised a combination of automatic and manual annotations to quantify how an article changed between the preprinted and published version.

We found that the total number of figure panels and tables changed little between preprint and published articles. Moreover, the conclusions of 6% of non-COVID-19-related and 15% of COVID-19-related abstracts undergo a discrete change by the time of publication, but the majority of these changes do not reverse the main message of the paper.

DOI : https://doi.org/10.1101/2021.02.20.432090

Publishing at any cost: a cross-sectional study of the amount that medical researchers spend on open access publishing each year

Authors : Mallory K. Ellingson, Xiaoting Shi, Joshua J. Skydel, Kate Nyhan,Richard Lehman, Joseph S. Ross, Joshua D. Wallach

Objective

To estimate the financial costs paid by individual medical researchers from meeting the article processing charges (APCs) levied by open access journals in 2019.

Design

Cross-sectional analysis.

Data sources

Scopus was used to generate two random samples of researchers, the first with a senior author article indexed in the ‘Medicine’ subject area (general researchers) and the second with an article published in the ten highest-impact factor general clinical medicine journals (high-impact researchers) in 2019.

For each researcher, Scopus was used to identify all first and senior author original research or review articles published in 2019. Data were obtained from Scopus, institutional profiles, Journal Citation Reports, publisher databases, the Directory of Open Access Journals, and individual journal websites.

Main outcome measures

Median APCs paid by general and high-impact researchers for all first and senior author research and review articles published in 2019.

Results

There were 241 general and 246 high-impact researchers identified as eligible for our study. In 2019, the general and high-impact researchers published a total of 914 (median 2, IQR 1–5) and 1471 (4, 2–8) first or senior author research or review articles, respectively. 42% (384/914) of the articles from the general researchers and 29% (428/1471) of the articles from the high-impact medical researchers were published in fully open access journals.

The median total APCs paid by general researchers in 2019 was US$191 (US$0–US$2500) and the median total paid by high-impact researchers was US$2900 (US$0–US$5465); the maximum paid by a single researcher in total APCs was US$30115 and US$34676, respectively.

Conclusions

Medical researchers in 2019 were found to have paid between US$0 and US$34676 in total APCs. As journals with APCs become more common, it is important to continue to evaluate the potential cost to researchers, especially on individuals who may not have the funding or institutional resources to cover these costs.

URL : Publishing at any cost: a cross-sectional study of the amount that medical researchers spend on open access publishing each year

DOI : http://dx.doi.org/10.1136/bmjopen-2020-047107

An overview of biomedical platforms for managing research data

Authors : Vivek Navale, Denis von Kaeppler, Matthew McAuliffe

Biomedical platforms provide the hardware and software to securely ingest, process, validate, curate, store, and share data. Many large-scale biomedical platforms use secure cloud computing technology for analyzing, integrating, and storing phenotypic, clinical, and genomic data. Several web-based platforms are available for researchers to access services and tools for biomedical research.

The use of bio-containers can facilitate the integration of bioinformatics software with various data analysis pipelines. Adoption of Common Data Models, Common Data Elements, and Ontologies can increase the likelihood of data reuse. Managing biomedical Big Data will require the development of strategies that can efficiently leverage public cloud computing resources.

The use of the research community developed standards for data collection can foster the development of machine learning methods for data processing and analysis. Increasingly platforms will need to support the integration of data from multiple disease area research.

URL : An overview of biomedical platforms for managing research data

DOI : https://doi.org/10.1007/s42488-020-00040-0

Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

Authors : Valentin Danchev, Yan Min, John Borghi, Mike Baiocchi, John P. A. Ioann

Importance

The benefits of responsible sharing of individual-participant data (IPD) from clinical studies are well recognized, but stakeholders often disagree on how to align those benefits with privacy risks, costs, and incentives for clinical trialists and sponsors.

The International Committee of Medical Journal Editors (ICMJE) required a data sharing statement (DSS) from submissions reporting clinical trials effective July 1, 2018. The required DSSs provide a window into current data sharing rates, practices, and norms among trialists and sponsors.

Objective

To evaluate the implementation of the ICMJE DSS requirement in 3 leading medical journals: JAMA, Lancet, and New England Journal of Medicine (NEJM).

Design, Setting, and Participants

This is a cross-sectional study of clinical trial reports published as articles in JAMA, Lancet, and NEJM between July 1, 2018, and April 4, 2020. Articles not eligible for DSS, including observational studies and letters or correspondence, were excluded.

A MEDLINE/PubMed search identified 487 eligible clinical trials in JAMA (112 trials), Lancet (147 trials), and NEJM (228 trials). Two reviewers evaluated each of the 487 articles independently.

Exposure

Publication of clinical trial reports in an ICMJE medical journal requiring a DSS.

Main Outcomes and Measures

The primary outcomes of the study were declared data availability and actual data availability in repositories. Other captured outcomes were data type, access, and conditions and reasons for data availability or unavailability. Associations with funding sources were examined.

Results

A total of 334 of 487 articles (68.6%; 95% CI, 64%-73%) declared data sharing, with nonindustry NIH-funded trials exhibiting the highest rates of declared data sharing (89%; 95% CI, 80%-98%) and industry-funded trials the lowest (61%; 95% CI, 54%-68%).

However, only 2 IPD sets (0.6%; 95% CI, 0.0%-1.5%) were actually deidentified and publicly available as of April 10, 2020. The remaining were supposedly accessible via request to authors (143 of 334 articles [42.8%]), repository (89 of 334 articles [26.6%]), and company (78 of 334 articles [23.4%]).

Among the 89 articles declaring that IPD would be stored in repositories, only 17 (19.1%) deposited data, mostly because of embargo and regulatory approval. Embargo was set in 47.3% of data-sharing articles (158 of 334), and in half of them the period exceeded 1 year or was unspecified.

Conclusions and Relevance

Most trials published in JAMA, Lancet, and NEJM after the implementation of the ICMJE policy declared their intent to make clinical data available. However, a wide gap between declared and actual data sharing exists.

To improve transparency and data reuse, journals should promote the use of unique pointers to data set location and standardized choices for embargo periods and access requirements.

URL : Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

DOI :10.1001/jamanetworkopen.2020.33972