Citations are not opinions: a corpus linguistics approach to understanding how citations are made

Author : Domenic Rosati

Citation content analysis seeks to understand citations based on the language used during the making of a citation. A key issue in citation content analysis is looking for linguistic structures that characterize distinct classes of citations for the purposes of understanding the intent and function of a citation.

Previous works have focused on modeling linguistic features first and drawn conclusions on the language structures unique to each class of citation function based on the performance of a classification task or inter-annotator agreement.

In this study, we start with a large sample of a pre-classified citation corpus, 2 million citations from each class of the scite Smart Citation dataset (supporting, disputing, and mentioning citations), and analyze its corpus linguistics in order to reveal the unique and statistically significant language structures belonging to each type of citation.

By generating comparison tables for each citation type we present a number of interesting linguistic features that uniquely characterize citation type. What we find is that within citation collocates, there is very low correlation between citation type and sentiment.

Additionally, we find that the subjectivity of citation collocates across classes is very low. These findings suggest that the sentiment of collocates is not a predictor of citation function and that due to their low subjectivity, an opinion-expressing mode of understanding citations, implicit in previous citation sentiment analysis literature, is inappropriate.

Instead, we suggest that citations can be better understood as claims-making devices where the citation type can be explained by understanding how two claims are being compared. By presenting this approach, we hope to inspire similar corpus linguistic studies on citations that derive a more robust theory of citation from an empirical basis using citation corpora.

URL : https://arxiv.org/abs/2104.08087

A metaresearch study revealed susceptibility of Covid-19 treatment research to white hat bias: first, do no harm

Author : Ioannis Bellos

Objective

To investigate the presence of white hat bias in Covid-19 treatment research by evaluating the effects of citation and reporting bias.

Study design and setting

Citation bias was investigated by assessing the degree of agreement between evidence provided by a remdesivir randomized controlled trial and its citing articles. The dissimilarity of outcomes derived from nonrandomized and randomized studies was tested by a meta-analysis of hydroxychloroquine effects on mortality. The differential influence of studies with beneficial over those with neutral results was evaluated by a bibliometric analysis.

Results

The articles citing the ACTT-1 remdesivir trial preferentially presented its positive outcomes in 55.83% and its negative outcomes in 6.43% of cases. The hydroxychloroquine indicated no significant effect by randomized studies, but a significant survival benefit by nonrandomized ones.

Citation mapping revealed that the study reporting survival benefit from the hydroxychloroquine-azithromycin combination was the most influential, despite subsequent studies reporting potential harmful effects.

Conclusion

The present study raises concerns about citation bias and a predilection of reporting beneficial over harmful effects in the Covid-19 treatment research, potentially in the context of white hat bias. Preregistration, data sharing and avoidance of selective reporting are crucial to ensure the credibility of future research.

DOI : https://doi.org/10.1016/j.jclinepi.2021.03.020

The aging effect in evolving scientific citation networks

Authors : Feng Hu, Lin Ma, Xiu-Xiu Zhan, Yinzuo Zhou, Chuang Liu, Haixing Zhao, Zi-Ke Zhang

The study of citation networks is of interest to the scientific community. However, the underlying mechanism driving individual citation behavior remains imperfectly understood, despite the recent proliferation of quantitative research methods.

Traditional network models normally use graph theory to consider articles as nodes and citations as pairwise relationships between them. In this paper, we propose an alternative evolutionary model based on hypergraph theory in which one hyperedge can have an arbitrary number of nodes, combined with an aging effect to reflect the temporal dynamics of scientific citation behavior.

Both theoretical approximate solution and simulation analysis of the model are developed and validated using two benchmark datasets from different disciplines, i.e. publications of the American Physical Society (APS) and the Digital Bibliography & Library Project (DBLP).

Further analysis indicates that the attraction of early publications will decay exponentially. Moreover, the experimental results show that the aging effect indeed has a significant influence on the description of collective citation patterns.

Shedding light on the complex dynamics driving these mechanisms facilitates the understanding of the laws governing scientific evolution and the quantitative evaluation of scientific outputs.

URL : The aging effect in evolving scientific citation networks

DOI : https://doi.org/10.1007/s11192-021-03929-8

What is the benefit from publishing a working paper in a journal in terms of citations? Evidence from economics

Authors : Klaus Wohlraben, Constantin Bürgi

Many papers in economics that are published in peer reviewed journals are initially released in widely circulated working paper series. This raises the question about the benefit of publishing in a peer-reviewed journal in terms of citations.

Specifically, we address the question: to what extent does the stamp of approval obtained by publishing in a peer-reviewed journal lead to more subsequent citations for papers that are already available in working paper series? Our data set comprises about 28,000 working papers from four major working paper series in economics.

Using panel data methods, we show that the publication in a peer reviewed journal results in around twice the number of yearly citations relative to working papers that never get published in a journal. Our results hold in several robustness checks.

URL : What is the benefit from publishing a working paper in a journal in terms of citations? Evidence from economics

DOI : https://doi.org/10.1007/s11192-021-03942-x

The Most Widely Disseminated COVID-19-Related Scientific Publications in Online Media: A Bibliometric Analysis of the Top 100 Articles with the Highest Altmetric Attention Scores

Authors : Ji Yoon Moon, Dae Young Yoon, Ji Hyun Hong, Kyoung Ja Lim, Sora Baek, Young Lan Seo, Eun Joo Yun

The novel coronavirus disease 2019 (COVID-19) is a global pandemic. This study’s aim was to identify and characterize the top 100 COVID-19-related scientific publications, which had received the highest Altmetric Attention Scores (AASs).

Hence, we searched Altmetric Explorer using search terms such as “COVID” or “COVID-19” or “Coronavirus” or “SARS-CoV-2” or “nCoV” and then selected the top 100 articles with the highest AASs. For each article identified, we extracted the following information: the overall AAS, publishing journal, journal impact factor (IF), date of publication, language, country of origin, document type, main topic, and accessibility.

The top 100 articles most frequently were published in journals with high (>10.0) IF (n = 67), were published between March and July 2020 (n = 67), were written in English (n = 100), originated in the United States (n = 45), were original articles (n = 59), dealt with treatment and clinical manifestations (n = 33), and had open access (n = 98).

Our study provides important information pertaining to the dissemination of scientific knowledge about COVID-19 in online media.

URL : The Most Widely Disseminated COVID-19-Related Scientific Publications in Online Media: A Bibliometric Analysis of the Top 100 Articles with the Highest Altmetric Attention Scores

DOI : https://doi.org/10.3390/healthcare9020239

Publication rate and citation counts for preprints released during the COVID-19 pandemic: the good, the bad and the ugly

Authors : Diego Añazco, Bryan Nicolalde, Isabel Espinosa, Jose Camacho , Mariam Mushtaq, Jimena Gimenez, Enrique Teran

Background

Preprints are preliminary reports that have not been peer-reviewed. In December 2019, a novel coronavirus appeared in China, and since then, scientific production, including preprints, has drastically increased. In this study, we intend to evaluate how often preprints about COVID-19 were published in scholarly journals and cited.

Methods

We searched the iSearch COVID-19 portfolio to identify all preprints related to COVID-19 posted on bioRxiv, medRxiv, and Research Square from January 1, 2020, to May 31, 2020. We used a custom-designed program to obtain metadata using the Crossref public API.

After that, we determined the publication rate and made comparisons based on citation counts using non-parametric methods. Also, we compared the publication rate, citation counts, and time interval from posting on a preprint server to publication in a scholarly journal among the three different preprint servers.

Results

Our sample included 5,061 preprints, out of which 288 were published in scholarly journals and 4,773 remained unpublished (publication rate of 5.7%). We found that articles published in scholarly journals had a significantly higher total citation count than unpublished preprints within our sample (p < 0.001), and that preprints that were eventually published had a higher citation count as preprints when compared to unpublished preprints (p < 0.001).

As well, we found that published preprints had a significantly higher citation count after publication in a scholarly journal compared to as a preprint (p < 0.001). Our results also show that medRxiv had the highest publication rate, while bioRxiv had the highest citation count and shortest time interval from posting on a preprint server to publication in a scholarly journal.

Conclusions

We found a remarkably low publication rate for preprints within our sample, despite accelerated time to publication by multiple scholarly journals. These findings could be partially attributed to the unprecedented surge in scientific production observed during the COVID-19 pandemic, which might saturate reviewing and editing processes in scholarly journals.

However, our findings show that preprints had a significantly lower scientific impact, which might suggest that some preprints have lower quality and will not be able to endure peer-reviewing processes to be published in a peer-reviewed journal.

URL : Publication rate and citation counts for preprints released during the COVID-19 pandemic: the good, the bad and the ugly

DOI : https://doi.org/10.7717/peerj.10927

Cite Unseen: Theory and Evidence on the Effect of Open Access on Cites to Academic Articles Across the Quality Spectrum

Authors : Mark J. McCabe, Christopher Snyder

Our previous paper (McCabe and Snyder 2014) contained the provocative result that, despite a positive average effect, open access reduces cites to some articles, in particular those published in lower-tier journals.

We propose a model in which open access leads more readers to acquire the full text, yielding more cites from some, but fewer cites from those who would have cited the article based on superficial knowledge but who refrain once they learn that the article is a bad match.

We test the theory with data for over 200,000 science articles binned by cites received during a pre-study period. Consistent with the theory, the marginal effect of open access is negative for the least-cited articles, positive for the most cited, and generally monotonic for quality levels in between.

Also consistent with the theory is a magnification of these effects for articles placed on PubMed Central, one of the broadest open-access platforms, and the differential pattern of results for cites from insiders versus outsiders to the article’s field.

URL : https://www.nber.org/papers/w28128