Phase 1 of the NIH Preprint Pilot: Testing the viability of making preprints discoverable in PubMed Central and PubMed

Authors : Kathryn Funk, Teresa Zayas-Cabán, Jeffrey Beck


The National Library of Medicine (NLM) launched a pilot in June 2020 to 1) explore the feasibility and utility of adding preprints to PubMed Central (PMC) and making them discoverable in PubMed and 2) to support accelerated discoverability of NIH-supported research without compromising user trust in NLM’s widely used literature services.


The first phase of the Pilot focused on archiving preprints reporting NIH-supported SARS-CoV-2 virus and COVID-19 research. To launch Phase 1, NLM identified eligible preprint servers and developed processes for identifying NIH-supported preprints within scope in these servers.

Processes were also developed for the ingest and conversion of preprints in PMC and to send corresponding records to PubMed. User interfaces were modified for display of preprint records. NLM collected data on the preprints ingested and discovery of preprint records in PMC and PubMed and engaged users through focus groups and a survey to obtain direct feedback on the Pilot and perceptions of preprints.


Between June 2020 and June 2022, NLM added more than 3,300 preprint records to PMC and PubMed, which were viewed 4 million times and 3 million times, respectively. Nearly a quarter of preprints in the Pilot were not associated with a peer-reviewed published journal article. User feedback revealed that the inclusion of preprints did not have a notable impact on trust in PMC or PubMed.


NIH-supported preprints can be identified and added to PMC and PubMed without disrupting existing operations processes. Additionally, inclusion of preprints in PMC and PubMed accelerates discovery of NIH research without reducing trust in NLM literature services.

Phase 1 of the Pilot provided a useful testbed for studying NIH investigator preprint posting practices, as well as knowledge gaps among user groups, during the COVID-19 public health emergency, an unusual time with heightened interest in immediate access to research results.

Examining the Impact of the National Institutes of Health Public Access Policy on the Citation Rates of Journal Articles


To examine whether National Institutes of Health (NIH) funded articles that were archived in PubMed Central (PMC) after the release of the 2008 NIH Public Access Policy show greater scholarly impact than comparable articles not archived in PMC.


A list of journals across several subject areas was developed from which to collect article citation data. Citation information and cited reference counts of the articles published in 2006 and 2009 from 122 journals were obtained from the Scopus database. The articles were separated into categories of NIH funded, non-NIH funded and whether they were deposited in PubMed Central. An analysis of citation data across a five-year timespan was performed on this set of articles.


A total of 45,716 articles were examined, including 7,960 with NIH-funding. An analysis of the number of times these articles were cited found that NIH-funded 2006 articles in PMC were not cited significantly more than NIH-funded non-PMC articles. However, 2009 NIH funded articles in PMC were cited 26% more than 2009 NIH funded articles not in PMC, 5 years after publication. This result is highly significant even after controlling for journal (as a proxy of article quality and topic).


Our analysis suggests that factors occurring between 2006 and 2009 produced a subsequent boost in scholarly impact of PubMed Central. The 2008 Public Access Policy is likely to be one such factor, but others may have contributed as well (e.g., growing size and visibility of PMC, increasing availability of full-text linkouts from PubMed, and indexing of PMC articles by Google Scholar).

URL : Examining the Impact of the National Institutes of Health Public Access Policy on the Citation Rates of Journal Articles

DOI : 10.1371/journal.pone.0139951

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study



This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository.


We analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article.


About 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects.


In addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets.

URL : Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

DOI : 10.1371/journal.pone.0132735

Publication of NIH funded trials registered in ClinicalTrials…

Publication of NIH funded trials registered in cross sectional analysis :

Objective To review patterns of publication of clinical trials funded by US National Institutes of Health (NIH) in peer reviewed biomedical journals indexed by Medline.

Design Cross sectional analysis.

Setting Clinical trials funded by NIH and registered within (, a trial registry and results database maintained by the US National Library of Medicine, after 30 September 2005 and updated as having been completed by 31 December 2008, allowing at least 30 months for publication after completion of the trial.

Main outcome measures Publication and time to publication in the biomedical literature, as determined through Medline searches, the last of which was performed in June 2011.

Results Among 635 clinical trials completed by 31 December 2008, 294 (46%) were published in a peer reviewed biomedical journal, indexed by Medline, within 30 months of trial completion. The median period of follow-up after trial completion was 51 months (25th-75th centiles 40-68 months), and 432 (68%) were published overall. Among published trials, the median time to publication was 23 months (14-36 months). Trials completed in either 2007 or 2008 were more likely to be published within 30 months of study completion compared with trials completed before 2007 (54% (196/366) v 36% (98/269); P<0.001).

Conclusions Despite recent improvement in timely publication, fewer than half of trials funded by NIH are published in a peer reviewed biomedical journal indexed by Medline within 30 months of trial completion. Moreover, after a median of 51 months after trial completion, a third of trials remained unpublished.”

doi: 10.1136/bmj.d7292

The Future of Taxpayer-Funded Research: Who Will Control Access to the Results?

This report examines the costs and benefits of increased public access, and proposals to either extend or overturn the NIH policy. It looks at increased public access to research results through the lens of “openness,” with a particular interest in how greater openness affects the progress of science, the productivity of the research enterprise, the process of innovation, the commercialization of research, and economic growth.


The Influence of the National Institutes of Health…

The Influence of the National Institutes of Health : Public-Access Policy on the Publishing Habits of Principal Investigators :

“The mandatory NIH public-access policy, which became effective on April 7, 2008, requires the NIH-funded principal investigators (PIs) to self-archive to the National Library of Medicine subject repository PubMed Central a manuscript’s electronic version immediately upon publication, which will then be available to the public free of cost the latest after a twelve-month embargo period. The Public Library of Science (PLoS), a non-profit open-access publisher in health sciences, publishes seven journals in the health sciences field (PLoS ONE, PLoS Biology, PLoS Medicine, PLoS Computational Biology, PLoS Genetics, PLoS Pathogenes and PLoS Neglected Tropical Diseases) and submits to PubMed Central all the published articles, irrespective of the funder of the research results. The PIs who had published in one of the PLoS journals were chosen based on the journals’ established high impact factor immediately after their creation. The PIs’ motivation to publish in one of the seven PLoS journals was unknown. Whether the NIH public-access policy has affected the PIs’ publishing decisions was also unknown.

A random sample of NIH-funded PIs, who had published in one of the PLoS journals between the years 2005- 2009, was selected from the RePORTER database. During the period
March-May 2011, forty-two PIs were interviewed using SkypeTM software, and a semi-structured open-ended interview protocol was followed. The participants were divided into two groups; the pre-mandate PIs, who had published in one of the seven PLoS journals during the period 2005-2007 and the post-mandate, who had published in the PLoS journals the during period 2008-2009. The publishing habits of these two groups were compared, in order to reach an understanding about their publishing decisions.

Based on the findings, the NIH-funded PIs choose the PLoS journals due to their high impact factor, fast publication speed, fair peer-review system and the articles’ open-access availability. Although the PIs agree with the premise that publicly funded research must be distributed for-free to everyone who has funded it, the steps required to comply with the policy were perceived to be time consuming. Since conformity with the policy is essential, the participants’ goal is to ensure that the manuscripts will appear to PubMed Central, which either can be self-archived by the PIs, by an administrative assistant or by the journal.

The NIH public-access policy did not cause either an increase in the PIs’ open-access awareness or a change in their publishing habits. The open-access advocates were supporters of the immediate free access to scientific information before the policy and provided their manuscripts free-of-cost before the policy’s mandate. The non-open-access advocates choose their publications based on quality criteria such as the journal’s prestige, impact factor, speed of publication and the attracted audience, while the article’s open-access availability is considered to be a plus. Furthermore, since a large number of journals comply with the NIH-policy, the participants did not have to change their publishing habits.”


Publishing Practices of NIH-Funded Facul…

Publishing Practices of NIH-Funded Faculty at MIT :
“Faculty and researchers who receive substantial funding from NIH were interviewed about their publication practices. Qualitative data was collected from interviews of eleven faculty members and one researcher representing six academic departments who received NIH funding. Interview responses were analyzed to identify a representative publication workflow and common themes related to the publication process. The goals of this study were to inform librarians about faculty publication practices; to learn how faculty are affected by and responding to NIH publication policy changes; and to inform planning and discussion about new services to support NIH compliance in addition to general faculty publishing.
Major themes from the interviews included consistency in publishing workflows, but variety in authorship patterns and in data management practices. Significant points of pain for authors included difficulty finding quality reviewers, frustrating submission processes, and discomfort about the implications of publication agreements. Some authors found the NIH submission requirement to be burdensome, but most assumed their publishers were taking care of this process for them. Implications for library services are considered.”