Phase 1 of the NIH Preprint Pilot: Testing the viability of making preprints discoverable in PubMed Central and PubMed

Authors : Kathryn Funk, Teresa Zayas-Cabán, Jeffrey Beck

Introduction

The National Library of Medicine (NLM) launched a pilot in June 2020 to 1) explore the feasibility and utility of adding preprints to PubMed Central (PMC) and making them discoverable in PubMed and 2) to support accelerated discoverability of NIH-supported research without compromising user trust in NLM’s widely used literature services.

Methods

The first phase of the Pilot focused on archiving preprints reporting NIH-supported SARS-CoV-2 virus and COVID-19 research. To launch Phase 1, NLM identified eligible preprint servers and developed processes for identifying NIH-supported preprints within scope in these servers.

Processes were also developed for the ingest and conversion of preprints in PMC and to send corresponding records to PubMed. User interfaces were modified for display of preprint records. NLM collected data on the preprints ingested and discovery of preprint records in PMC and PubMed and engaged users through focus groups and a survey to obtain direct feedback on the Pilot and perceptions of preprints.

Results

Between June 2020 and June 2022, NLM added more than 3,300 preprint records to PMC and PubMed, which were viewed 4 million times and 3 million times, respectively. Nearly a quarter of preprints in the Pilot were not associated with a peer-reviewed published journal article. User feedback revealed that the inclusion of preprints did not have a notable impact on trust in PMC or PubMed.

Discussion

NIH-supported preprints can be identified and added to PMC and PubMed without disrupting existing operations processes. Additionally, inclusion of preprints in PMC and PubMed accelerates discovery of NIH research without reducing trust in NLM literature services.

Phase 1 of the Pilot provided a useful testbed for studying NIH investigator preprint posting practices, as well as knowledge gaps among user groups, during the COVID-19 public health emergency, an unusual time with heightened interest in immediate access to research results.

Open Access of COVID-19-related publications in the first quarter of 2020: a preliminary study based in PubMed

Authors : Olatz Arrizabalaga, David Otaegui, Itziar Vergara, Julio Arrizabalaga, Eva Méndez

Background

The COVID-19 outbreak has made funders, researchers and publishers agree to have research publications, as well as other research outputs, such as data, become openly available.

In this extraordinary research context of the SARS CoV-2 pandemic, publishers are announcing that their coronavirus-related articles will be made immediately accessible in appropriate open repositories, like PubMed Central, agreeing upon funders’ and researchers’ instigation.

Methods

This work uses Unpaywall, OpenRefine and PubMed to analyse the level of openness of articles about COVID-19, published during the first quarter of 2020. It also analyses Open Access (OA) articles published about previous coronavirus (SARS CoV-1 and MERS CoV) as a means of comparison.

Results

A total of 5,611 COVID-19-related articles were analysed from PubMed. This is a much higher amount for a period of 4 months compared to those found for SARS CoV-1 and MERS during the first year of their first outbreaks (335 and 116 articles, respectively).

Regarding the levels of openness, 88.8% of the SARS CoV-2 papers are freely available; similar rates were found for the other coronaviruses. Deeper analysis showed that (i) 67.4% of articles belong to an undefined Bronze category; (ii) 76.4% of all OA papers don’t carry any license, followed by 10.4% which display restricted licensing. These patterns were found to be repeated in the three most frequent publishers: Elsevier, Springer and Wiley.

Conclusions

Our results suggest that, although scientific production is much higher than during previous epidemics and is open, there is a caveat to this opening, characterized by the absence of fundamental elements and values ​​on which Open Science is based, such as licensing.

URL : Open Access of COVID-19-related publications in the first quarter of 2020: a preliminary study based in PubMed

DOI : https://doi.org/10.12688/f1000research.24136.1

Worldwide inequality in access to full textscientific articles: the example ofophthalmology

Authors : Christophe Boudry, Patricio Alvarez-Muñoz, Ricardo Arencibia-Jorge, Didier Ayena, Niels J. Brouwer, Zia Chaudhuri, Brenda Chawner, Emilienne Epee, Khalil Erraïs, Akbar Fotouhi, Almutez M. Gharaibeh, Dina H. Hassanein, Martina C. Herwig-Carl, Katherine Howard, Dieudonne Kaimbo Wa Kaimbo, Patricia-Ann Laughrea, Fernando A. Lopez, Juan D. Machin-Mastromatteo, Fernando K. Malerbi, Papa Amadou Ndiaye, Nina A. Noor, Josmel Pacheco-Mendoza, Vasilios P. Papastefanou, Mufarriq Shah, Carol L. Shields, Ya Xing Wang, Vasily Yartsev, Frederic Mouriaux

Background

The problem of access to medical information, particularly in low-income countries, has been under discussion for many years. Although a number of developments have occurred in the last decade (e.g., the open access (OA) movement and the website Sci-Hub), everyone agrees that these difficulties still persist very widely, mainly due to the fact that paywalls still limit access to approximately 75% of scholarly documents.

In this study, we compare the accessibility of recent full text articles in the field of ophthalmology in 27 established institutions located worldwide.

Methods

A total of 200 references from articles were retrieved using the PubMed database. Each article was individually checked for OA. Full texts of non-OA (i.e., “paywalled articles”) were examined to determine whether they were available using institutional and Hinari access in each institution studied, using “alternative ways” (i.e., PubMed Central, ResearchGate, Google Scholar, and Online Reprint Request), and using the website Sci-Hub.

Results

The number of full texts of “paywalled articles” available using institutional and Hinari access showed strong heterogeneity, scattered between 0% full texts to 94.8% (mean = 46.8%; SD = 31.5; median = 51.3%).

We found that complementary use of “alternative ways” and Sci-Hub leads to 95.5% of full text “paywalled articles,” and also divides by 14 the average extra costs needed to obtain all full texts on publishers’ websites using pay-per-view.

Conclusions

The scant number of available full text “paywalled articles” in most institutions studied encourages researchers in the field of ophthalmology to use Sci-Hub to search for scientific information.

The scientific community and decision-makers must unite and strengthen their efforts to find solutions to improve access to scientific literature worldwide and avoid an implosion of the scientific publishing model.

This study is not an endorsement for using Sci-Hub. The authors, their institutions, and publishers accept no responsibility on behalf of readers.

URL : Worldwide inequality in access to full textscientific articles: the example ofophthalmology

DOI : https://doi.org/10.7717/peerj.7850

Comparing quality of reporting between preprints and peer-reviewed articles in the biomedical literature

Authors : Clarissa F. D. Carneiro, Victor G. S. Queiroz, Thiago C. Moulin, Carlos A. M. Carvalho, Clarissa B. Haas, Danielle Rayêe, David E. Henshall, Evandro A. De-Souza, Felippe Espinelli, Flávia Z. Boos, Gerson D. Guercio, Igor R. Costa, Karina L. Hajdu, Martin Modrák, Pedro B. Tan, Steven J. Burgess, Sylvia F. S. Guerra, Vanessa T. Bortoluzzi, Olavo B. Amara

Preprint usage is growing rapidly in the life sciences; however, questions remain on the relative quality of preprints when compared to published articles. An objective dimension of quality that is readily measurable is completeness of reporting, as transparency can improve the reader’s ability to independently interpret data and reproduce findings.

In this observational study, we compared random samples of articles published in bioRxiv and in PubMed-indexed journals in 2016 using a quality of reporting questionnaire. We found that peer-reviewed articles had, on average, higher quality of reporting than preprints, although this difference was small.

We found larger differences favoring PubMed in subjective ratings of how clearly titles and abstracts presented the main findings and how easy it was to locate relevant reporting information.

Interestingly, an exploratory analysis showed that preprints with figures and legends embedded within text had reporting scores similar to PubMed articles.

These differences cannot be directly attributed to peer review or editorial processes, as manuscripts might already differ before submission due to greater uptake of preprints by particular research communities.

Nevertheless, our results show that quality of reporting in preprints in the life sciences is within a similar range as that of peer-reviewed articles, albeit slightly lower on average, supporting the idea that preprints should be considered valid scientific contributions.

An ongoing second phase of the project is comparing preprints to their own published versions in order to more directly assess the effects of peer review.

URL : Comparing quality of reporting between preprints and peer-reviewed articles in the biomedical literature

DOI : https://doi.org/10.1101/581892

Exploring PubMed as a reliable resource for scholarly communications services

Authors : Peace Ossom Williamson, Christian I. J. Minter

Objective

PubMed’s provision of MEDLINE and other National Library of Medicine (NLM) resources has made it one of the most widely accessible biomedical resources globally. The growth of PubMed Central (PMC) and public access mandates have affected PubMed’s composition.

The authors tested recent claims that content in PMC is of low quality and affects PubMed’s reliability, while exploring PubMed’s role in the current scholarly communications landscape.

Methods

The percentage of MEDLINE-indexed records was assessed in PubMed and various subsets of records from PMC. Data were retrieved via the National Center for Biotechnology Information (NCBI) interface, and follow-up interviews with a PMC external reviewer and staff at NLM were conducted.

Results

Almost all PubMed content (91%) is indexed in MEDLINE; however, since the launch of PMC, the percentage of PubMed records indexed in MEDLINE has slowly decreased.

This trend is the result of an increase in PMC content from journals that are not indexed in MEDLINE and not a result of author manuscripts submitted to PMC in compliance with public access policies. Author manuscripts in PMC continue to be published in MEDLINE-indexed journals at a high rate (85%).

The interviewees clarified the difference between the sources, with MEDLINE serving as a highly selective index of journals in biomedical literature and PMC serving as an open archive of quality biomedical and life sciences literature and a repository of funded research.

Conclusion

The differing scopes of PMC and MEDLINE will likely continue to affect their overlap; however, quality control exists in the maintenance and facilitation of both resources, and funding from major grantors is a major component of quality assurance in PMC.

URL : Exploring PubMed as a reliable resource for scholarly communications services

DOI : dx.doi.org/10.5195/jmla.2019.433

Biotea: semantics for Pubmed Central

Authors : Alexander Garcia​, Federico Lopez, Leyla Garcia, Olga Giraldo, Victor Bucheli, Michel Dumontier

A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies.

In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology.

We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language.

We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.

URL : Biotea: semantics for Pubmed Central

DOI : https://doi.org/10.7717/peerj.4201

Publications en libre accès en biologie–médecine : historique et état des lieux en 2016

Auteurs/Authors : Christophe Boudry, Manuel Durand-Barthez

L’apparition du mouvement « open access » (libre accès, LA) et des archives ouvertes a bouleversé (et bouleverse encore) l’économie et l’accès aux publications scientifiques. L’objectif de cet article est de réactualiser et compléter les résultats des études antérieures qui ont tenté de quantifier l’importance du LA dans le domaine de la biologie/médecine, par le biais d’un focus sur la base de données bibliographiques PubMed.

Une analyse des publications en LA dans PubMed en fonction de l’origine géographique des auteurs a également été menée (pays et continents) et un certain nombre de paramètres liés au LA (évolution du nombre de journaux en LA, nombre de mandats et d’archives ouvertes par pays et continents) ont également été étudiés et mis en perspective. Les résultats mettent en évidence que les pourcentages d’articles dont le texte intégral et disponible en LA ne cessent de progresser et concernent en 2015, 39,1 % des articles disponibles dans PubMed.

L’analyse géographique des 25 pays les plus productifs et des continents montre une grande variabilité concernant le pourcentage d’articles en LA (de 21,9 % pour l’Italie à 42,08 % pour les États-Unis et de 22,80 % pour l’Océanie à 40,84 % pour l’Amérique du Nord).

Par ailleurs, nos données montrent que le nombre de mandats et d’archives ouvertes n’est pas corrélé de manière significative au pourcentage d’articles en LA au niveau national et continental, confirmant ainsi que les politiques publiques successives ou les mandats relatifs au LA n’ont eu qu’une influence, sinon secondaire, du moins inférieure aux attentes.

La mise en place de mandats plus coercitifs parviendra peut-être à obtenir des effets plus significatifs à plus ou moins long terme. L’augmentation régulière du nombre de journaux en LA, concomitante à l’augmentation avérée du nombre de citations des articles en LA, amplifiera certainement encore l’attrait des auteurs pour le LA.

DOI : https://doi.org/10.1016/j.jemep.2017.02.021