Does ChatGPT Ignore Article Retractions and Other Reliability Concerns?

Authors : Mike Thelwall, Marianna Lehtisaari, Irini Katsirea, Kim Holmberg, Er-Te Zheng

Large language models (LLMs) like ChatGPT seem to be increasingly used for information seeking and analysis, including to support academic literature reviews. To test whether the results might sometimes include retracted research, we identified 217 retracted or otherwise concerning academic studies with high altmetric scores and asked ChatGPT 4o-mini to evaluate their quality 30 times each.

Surprisingly, none of its 6510 reports mentioned that the articles were retracted or had relevant errors, and it gave 190 relatively high scores (world leading, internationally excellent, or close). The 27 articles with the lowest scores were mostly accused of being weak, although the topic (but not the article) was described as controversial in five cases (e.g., about hydroxychloroquine for COVID-19).

In a follow-up investigation, 61 claims were extracted from retracted articles from the set, and ChatGPT 4o-mini was asked 10 times whether each was true. It gave a definitive yes or a positive response two-thirds of the time, including for at least one statement that had been shown to be false over a decade ago.

The results therefore emphasise, from an academic knowledge perspective, the importance of verifying information from LLMs when using them for information seeking or analysis.

URL : Does ChatGPT Ignore Article Retractions and Other Reliability Concerns?

DOI : https://doi.org/10.1002/leap.2018