PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge

Authors  : Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian, Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, Zhiyong Lu

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly.

PubTator 3.0’s online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results.

We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

URL : PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge

DOI : https://doi.org/10.1093/nar/gkae235

PreprintMatch: A tool for preprint to publication detection shows global inequities in scientific publication

Authors : Peter Eckmann, Anita Bandrowski

Preprints, versions of scientific manuscripts that precede peer review, are growing in popularity. They offer an opportunity to democratize and accelerate research, as they have no publication costs or a lengthy peer review process. Preprints are often later published in peer-reviewed venues, but these publications and the original preprints are frequently not linked in any way.

To this end, we developed a tool, PreprintMatch, to find matches between preprints and their corresponding published papers, if they exist. This tool outperforms existing techniques to match preprints and papers, both on matching performance and speed. PreprintMatch was applied to search for matches between preprints (from bioRxiv and medRxiv), and PubMed.

The preliminary nature of preprints offers a unique perspective into scientific projects at a relatively early stage, and with better matching between preprint and paper, we explored questions related to research inequity.

We found that preprints from low income countries are published as peer-reviewed papers at a lower rate than high income countries (39.6% and 61.1%, respectively), and our data is consistent with previous work that cite a lack of resources, lack of stability, and policy choices to explain this discrepancy.

Preprints from low income countries were also found to be published quicker (178 vs 203 days) and with less title, abstract, and author similarity to the published version compared to high income countries. Low income countries add more authors from the preprint to the published version than high income countries (0.42 authors vs 0.32, respectively), a practice that is significantly more frequent in China compared to similar countries.

Finally, we find that some publishers publish work with authors from lower income countries more frequently than others.

URL : PreprintMatch: A tool for preprint to publication detection shows global inequities in scientific publication

DOI : https://doi.org/10.1371/journal.pone.0281659

Publication practices during the COVID-19 pandemic: Expedited publishing or simply an early bird effect?

Authors : Yulia V. Sevryugina, Andrew J. Dicks

This study explores the evolution of publication practices associated with the SARS-CoV-2 research papers, namely, peer-reviewed journal and review articles indexed in PubMed and their associated preprints posted on bioRxiv and medRxiv servers: a total of 4,031 journal article-preprint pairs.

Our assessment of various publication delays during the January 2020 to March 2021 period revealed the early bird effect that lies beyond the involvement of any publisher policy action and is directly linked to the emerging nature of new and ‘hot’ scientific topics.

We found that when the early bird effect and data incompleteness are taken into account, COVID-19 related research papers show only a moderately expedited speed of dissemination as compared with the pre-pandemic era.

Medians for peer-review and production stage delays were 66 and 15 days, respectively, and the entire conversion process from a preprint to its peer-reviewed journal article version took 109.5 days.

The early bird effect produced an ephemeral perception of a global rush in scientific publishing during the early days of the coronavirus pandemic. We emphasize the importance of considering the early bird effect in interpreting publication data collected at the outset of a newly emerging event.

URL : Publication practices during the COVID-19 pandemic: Expedited publishing or simply an early bird effect?

DOI : https://doi.org/10.1002/leap.1483

Publishing of COVID-19 preprints in peer-reviewed journals, preprinting trends, public discussion and quality issues

Authors : Ivan Kodvanj, Jan Homolak, Vladimir Trkulja

COVID-19-related (vs. non-related) articles appear to be more expeditiously processed and published in peer-reviewed journals.

We aimed to evaluate: (i) whether COVID-19-related preprints were favored for publication, (ii) preprinting trends and public discussion of the preprints, and (iii) the relationship between the publication topic (COVID-19-related or not) and quality issues.

Manuscripts deposited at bioRxiv and medRxiv between January 1 and September 27 2020 were assessed for the probability of publishing in peer-reviewed journals, and those published were evaluated for submission-to-acceptance time. The extent of public discussion was assessed based on Altmetric and Disqus data.

The Retraction Watch Database and PubMed were used to explore the retraction of COVID-19 and non-COVID-19 articles and preprints. With adjustment for the preprinting server and number of deposited versions, COVID-19-related preprints were more likely to be published within 120 days since the deposition of the first version (OR = 1.96, 95% CI: 1.80–2.14) as well as over the entire observed period (OR = 1.39, 95% CI: 1.31–1.48). Submission-to-acceptance was by 35.85 days (95% CI: 32.25–39.45) shorter for COVID-19 articles.

Public discussion of preprints was modest and COVID-19 articles were overrepresented in the pool of retracted articles in 2020. Current data suggest a preference for publication of COVID-19-related preprints over the observed period.

URL : https://doi.org/10.1007/s11192-021-04249-7

Accuracy of PubMed-based author lists of publications and use of author identifiers to address author name ambiguity: a cross-sectional study

Authors : Paul Sebo, Sylvain de Lucia, Nathalie Vernaz

Objective

To assess the accuracy of PubMed-based author lists of publications and use of author identifiers to address author name ambiguity.

Methods

In this Swiss study conducted in 2019, 300 hospital-based senior physicians were asked to generate a list of their publications in PubMed and complete a questionnaire (type of query used, number of errors in their list of publications, knowledge and use of ORCID and ResearcherID).

Results

156 physicians (52%) agreed to participate, 145 of whom published at least one article (mean number of publications: 60 (SD 73)). Only 17% used the advanced search option. On average, there were 5 articles in the lists that were not co-authored by participants (advanced search: 1.0 (SD 2.6) vs. 5.9 (SD 13.9), p value 0.02) and 3 articles co-authored by participants that did not appear in the lists (advanced search: 1.5 (SD 2.0) vs. 3.6 (SD 8.4), p-value 0.05). Although 82% were aware of ORCID, only 16% added all their articles (39% and 6% respectively for ResearcherID).

Conclusions

When used by senior physicians, the advanced search in PubMed is accurate for retrieving authors’ publications. Author identifiers are only used by a minority of physicians and are therefore not recommended in this context, as they would lead to inaccurate results.

URL : Accuracy of PubMed-based author lists of publications and use of author identifiers to address author name ambiguity: a cross-sectional study

DOI : https://doi.org/10.1007/s11192-020-03845-3