Can ChatGPT write better scientific titles? A comparative evaluation of human-written and AI-generated titles

Authors : Paul Sebo, Bing Nie, Ting Wang

Background

Large language models (LLMs) such as GPT-4 are increasingly used in scientific writing, yet little is known about how AI-generated scientific titles are perceived by researchers in terms of quality.

Objective

To compare the perceived alignment with the abstract content (as a surrogate for perceived accuracy), appeal, and overall preference for AI-generated versus human-written scientific titles.

Methods

We conducted a blinded comparative study with 21 researchers from diverse academic backgrounds. A random sample of 50 original titles was selected from 10 high-impact general internal medicine journals. For each title, an alternative version was generated using GPT-4.0. Each rater evaluated 50 pairs of titles, each pair consisting of one original and one AI-generated version, without knowing the source of the titles or the purpose of the study.

For each pair, raters independently assessed both titles on perceived alignment with the abstract content and appeal, and indicated their overall preference. We analyzed alignment and appeal using Wilcoxon signed-rank tests and mixed-effects ordinal logistic regressions, preferences using McNemar’s test and mixed-effects logistic regression, and inter-rater agreement with Gwet’s AC.
Results

AI-generated titles received significantly higher ratings for both perceived alignment with the abstract content (mean 7.9 vs. 6.7, p-value <0.001) and appeal (mean 7.1 vs. 6.7, p-value <0.001) than human-written titles. The odds of preferring an AI-generated title were 1.7 times higher (p-value =0.001), with 61.8% of 1,049 paired judgments favoring the AI version. Inter-rater agreement was moderate to substantial (Gwet’s AC: 0.54–0.70).

Conclusions

AI-generated titles were rated more favorably than human-written titles within the context of this study in terms of perceived alignment with the abstract content, appeal, and preference, suggesting that LLMs may enhance the effectiveness of scientific communication. These findings support the responsible integration of AI tools in research.

URL : Can ChatGPT write better scientific titles? A comparative evaluation of human-written and AI-generated titles

DOI : https://doi.org/10.12688/f1000research.173647.2