An analysis of the effects of sharing research data, code, and preprints on citations

Authors : Giovanni Colavizza, Lauren Cadwallader, Marcel LaFlamme, Grégory Dozot, Stéphane Lecorney, Daniel Rappo, Iain Hrynaszkiewicz

Calls to make scientific research more open have gained traction with a range of societal stakeholders. Open Science practices include but are not limited to the early sharing of results via preprints and openly sharing outputs such as data and code to make research more reproducible and extensible. Existing evidence shows that adopting Open Science practices has effects in several domains.

In this study, we investigate whether adopting one or more Open Science practices leads to significantly higher citations for an associated publication, which is one form of academic impact. We use a novel dataset known as Open Science Indicators, produced by PLOS and DataSeer, which includes all PLOS publications from 2018 to 2023 as well as a comparison group sampled from the PMC Open Access Subset. In total, we analyze circa 122’000 publications. We calculate publication and author-level citation indicators and use a broad set of control variables to isolate the effect of Open Science Indicators on received citations.

We show that Open Science practices are adopted to different degrees across scientific disciplines. We find that the early release of a publication as a preprint correlates with a significant positive citation advantage of about 20.2% on average. We also find that sharing data in an online repository correlates with a smaller yet still positive citation advantage of 4.3% on average.

However, we do not find a significant citation advantage for sharing code. Further research is needed on additional or alternative measures of impact beyond citations. Our results are likely to be of interest to researchers, as well as publishers, research funders, and policymakers.

Arxiv :

Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest

Author : Walid Hariri

Scientific articles play a crucial role in advancing knowledge and informing research directions. One key aspect of evaluating scientific articles is the analysis of citations, which provides insights into the impact and reception of the cited works. This article introduces the innovative use of large language models, particularly ChatGPT, for comprehensive sentiment analysis of citations within scientific articles.

By leveraging advanced natural language processing (NLP) techniques, ChatGPT can discern the nuanced positivity or negativity of citations, offering insights into the reception and impact of cited works. Furthermore, ChatGPT’s capabilities extend to detecting potential biases and conflicts of interest in citations, enhancing the objectivity and reliability of scientific literature evaluation.

This study showcases the transformative potential of artificial intelligence (AI)-powered tools in enhancing citation analysis and promoting integrity in scholarly research.

Arxiv :

Citation differences across research funding and access modalities

Authors : Pablo Dorta-González, María Isabel Dorta-González

This research provides insight into the complex relationship between open access, funding, and citation advantage. It presents an analysis of research articles and their citations in the Scopus database across 40 subject categories.

The sample includes 12 categories from Health Sciences, 7 from Life Sciences, 10 from Physical Sciences & Engineering, and 11 from Social Sciences & Humanities. Specifically, the analysis focuses on articles published in 2016 and the citations they received from 2016 to 2020.

Our findings show that open access articles published in hybrid journals receive considerably more citations than those published in gold open access journals. Articles under the hybrid gold modality are cited on average twice as much as those in the gold modality, regardless of funding.

Furthermore, we found that funded articles generally obtain 50 % more citations than unfunded ones within the same publication modality. Open access repositories significantly increase citations, particularly for articles without funding. Thus, articles in open access repositories receive 50 % more citations than paywalled ones.

URL : Citation differences across research funding and access modalities


Egocentric cocitation networks and scientific papers destinies

Authors : Béatrice Milard, Yoann Pitarch

To what extent is the destiny of a scientific paper shaped by the cocitation network in which it is involved? What are the social contexts that can explain these structuring? Using bibliometric data, interviews with researchers, and social network analysis, this article proposes a typology based on egocentric cocitation networks that displays a quadruple structuring (before and after publication): polarization, clusterization, atomization, and attrition.

It shows that the academic capital of the authors and the intellectual resources of their research are key factors of these destinies, as are the social relations between the authors concerned.

The circumstances of the publishing are also correlated with the structuring of the egocentric cocitation networks, showing how socially embedded they are. Finally, the article discusses the contribution of these original networks to the analyze of scientific production and its dynamics.

URL : Egocentric cocitation networks and scientific papers destinies


A quantitative and qualitative open citation analysis of retracted articles in the humanities

Authors : Ivan Heibi, Silvio Peroni

In this article, we show and discuss the results of a quantitative and qualitative analysis of open citations to retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason).

Then, we gathered the citing entities and annotated their basic metadata (e.g., title, venue, etc.) and the characteristics of their in-text citations (e.g., intent, sentiment, etc.). Using these data, we performed a quantitative and qualitative study of retractions in the humanities, presenting descriptive statistics and a topic modeling analysis of the citing entities’ abstracts and the in-text citation contexts.

As part of our main findings, we noticed that there was no drop in the overall number of citations after the year of retraction, with few entities which have either mentioned the retraction or expressed a negative sentiment toward the cited publication.

In addition, on several occasions, we noticed a higher concern/awareness when it was about citing a retracted publication, by the citing entities belonging to the health sciences domain, if compared to the humanities and the social science domains. Philosophy, arts, and history are the humanities areas that showed the higher concerns toward the retraction.

URL : A quantitative and qualitative open citation analysis of retracted articles in the humanities


Gender and country biases in Wikipedia citations to scholarly publications

Authors : Xiang Zheng, Jiajing Chen, Erjia Yan, Chaoqun Ni

Ensuring Wikipedia cites scholarly publications based on quality and relevancy without biases is critical to credible and fair knowledge dissemination. We investigate gender- and country-based biases in Wikipedia citation practices using linked data from the Web of Science and a Wikipedia citation dataset.

Using coarsened exact matching, we show that publications by women are cited less by Wikipedia than expected, and publications by women are less likely to be cited than those by men. Scholarly publications by authors affiliated with non-Anglosphere countries are also disadvantaged in getting cited by Wikipedia, compared with those by authors affiliated with Anglosphere countries.

The level of gender- or country-based inequalities varies by research field, and the gender-country intersectional bias is prominent in math-intensive STEM fields. To ensure the credibility and equality of knowledge presentation, Wikipedia should consider strategies and guidelines to cite scholarly publications independent of the gender and country of authors.

URL : Gender and country biases in Wikipedia citations to scholarly publications


Forecasting the publication and citation outcomes of COVID-19 preprints

Authors : Michael Gordon, Michael Bishop, Yiling Chen, Anna Dreber, Brandon Goldfedder, Felix Holzmeister, Magnus Johannesson, Yang Liu, Louisa Tran, Charles Twardy, Juntao Wang, Thomas Pfeiffer

Many publications on COVID-19 were released on preprint servers such as medRxiv and bioRxiv. It is unknown how reliable these preprints are, and which ones will eventually be published in scientific journals.

In this study, we use crowdsourced human forecasts to predict publication outcomes and future citation counts for a sample of 400 preprints with high Altmetric score. Most of these preprints were published within 1 year of upload on a preprint server (70%), with a considerable fraction (45%) appearing in a high-impact journal with a journal impact factor of at least 10.

On average, the preprints received 162 citations within the first year. We found that forecasters can predict if preprints will be published after 1 year and if the publishing journal has high impact. Forecasts are also informative with respect to Google Scholar citations within 1 year of upload on a preprint server.

For both types of assessment, we found statistically significant positive correlations between forecasts and observed outcomes. While the forecasts can help to provide a preliminary assessment of preprints at a faster pace than traditional peer-review, it remains to be investigated if such an assessment is suited to identify methodological problems in preprints.

URL : Forecasting the publication and citation outcomes of COVID-19 preprints