Analysis of scientific paper retractions due to data problems: Revealing challenges and countermeasures in data management

Authors : Wanfei Hu, Guiliang Yan, Jingyu Zhang, Zhenli Chen, Qing Qian, Sizhu Wu

Background

Scientific data, the cornerstone of scientific endeavors, face management challenges amid technological advances. While retractions are analyzed, a rigorous focus on data problems leading to them is missing.

Methods

This study collected 49,979 retraction records up to 17 December 2023. After screening 16,842 records were related to data problems and 19,656 were due to other reasons. Methods such as descriptive statistics, hypothesis testing, and the BERTopic (Bidirectional Encoder Representations from Transformers Topic Modelling) were applied to conduct a topic analysis of article titles.

Result

The results show that since 2000, retractions due to data problems have increased significantly (p < 0.001), with the percentage in 2023 exceeding 75%. Among 16,842 data-related retractions, 59.0% were in Basic Life Sciences and 40.2% in Health Sciences. Data problems involve accuracy, reliability, validity, and integrity. There are significant differences (p < 0.001) in subjects, journal quartiles, retraction intervals, and other characteristics between data-related and other retractions. Data-related retractions are more concentrated in high-impact journals (Q1 37.6% and Q2 43.0%).

Conclusions

Institutions, publishers, and journals should adopt image-screening tools, enforce data deposition, standardize retraction notices, provide ethics training, and strengthen peer review to address these data problems, guiding better data management and healthier scientific development.

URL : Analysis of scientific paper retractions due to data problems Revealing challenges and countermeasures in data management

DOI : https://doi.org/10.1080/08989621.2025.2531987