Authors : Marc Bertin, Iana Atanassova
The role of preprints in the scientific production and their part in citations have been growing over the past 10 years. In this paper we study preprint citations in several different aspects: the progression of preprint citations over time, their relative frequencies in relation to the IMRaD structure of articles, their distributions over time, per preprint database and per PLOS journal.
We have processed the PLOS corpus that covers 7 journals and a total of about 240,000 articles up to January 2021, and produced a dataset of 8460 preprint citation contexts that cite 12 different preprint databases.
Our results show that preprint citations are found with the highest frequency in the Method section of articles, though small variations exist with respect to journals. The PLOS Computational Biology journal stands out as it contains more than three times more preprint citations than any other PLOS journal.
The relative parts of the different preprint databases are also examined. While ArXiv and bioRxiv are the most frequent citation sources, bioRxiv’s disciplinary nature can be observed as it is the source of more than 70% of preprint citations in PLOS Biology, PLOS Genetics and PLOS Pathogens.
We have also compared the lexical content of preprint citation contexts to the citation content to peer-reviewed publications. Finally, by performing a lexicometric analysis, we have shown that preprint citation contexts differ significantly from citation contexts of peer-reviewed publications.
This confirms that authors make use of different lexical content when citing preprints compared to the rest of citations.