Can ChatGPT write better scientific titles? A comparative evaluation of human-written and AI-generated titles

Authors : Paul Sebo, Bing Nie, Ting Wang

Background

Large language models (LLMs) such as GPT-4 are increasingly used in scientific writing, yet little is known about how AI-generated scientific titles are perceived by researchers in terms of quality.

Objective

To compare the perceived alignment with the abstract content (as a surrogate for perceived accuracy), appeal, and overall preference for AI-generated versus human-written scientific titles.

Methods

We conducted a blinded comparative study with 21 researchers from diverse academic backgrounds. A random sample of 50 original titles was selected from 10 high-impact general internal medicine journals. For each title, an alternative version was generated using GPT-4.0. Each rater evaluated 50 pairs of titles, each pair consisting of one original and one AI-generated version, without knowing the source of the titles or the purpose of the study.

For each pair, raters independently assessed both titles on perceived alignment with the abstract content and appeal, and indicated their overall preference. We analyzed alignment and appeal using Wilcoxon signed-rank tests and mixed-effects ordinal logistic regressions, preferences using McNemar’s test and mixed-effects logistic regression, and inter-rater agreement with Gwet’s AC.
Results

AI-generated titles received significantly higher ratings for both perceived alignment with the abstract content (mean 7.9 vs. 6.7, p-value <0.001) and appeal (mean 7.1 vs. 6.7, p-value <0.001) than human-written titles. The odds of preferring an AI-generated title were 1.7 times higher (p-value =0.001), with 61.8% of 1,049 paired judgments favoring the AI version. Inter-rater agreement was moderate to substantial (Gwet’s AC: 0.54–0.70).

Conclusions

AI-generated titles were rated more favorably than human-written titles within the context of this study in terms of perceived alignment with the abstract content, appeal, and preference, suggesting that LLMs may enhance the effectiveness of scientific communication. These findings support the responsible integration of AI tools in research.

URL : Can ChatGPT write better scientific titles? A comparative evaluation of human-written and AI-generated titles

DOI : https://doi.org/10.12688/f1000research.173647.2

A framework for assessing the trustworthiness of scientific research findings

Authors : Brian A. Nosek, David B. Allison, Kathleen Hall Jamieson, Marcia McNutt, A. Beau Nielsen, Susan M. Wolf

Vigorous debate has erupted over the trustworthiness of scientific research findings in a number of domains. The question “what makes research findings trustworthy?” elicits different answers depending on whether the emphasis is on research integrity and ethics, research methods, transparency, inclusion, assessment and peer review, or scholarly communication. Each provides partial insight.

We offer a systems approach that focuses on whether the research is accountable, evaluable, well-formulated, has been evaluated, controls for bias, reduces error, and whether the claims are warranted by the evidence. We tie each of these components to measurable indicators of trustworthiness for evaluating the research itself, the researchers conducting the research, and the organizations supporting the research.

Our goals are to offer a framework that can be applied across methods, approaches, and disciplines and to foster innovation in development of trustworthiness indicators. Developing valid indicators will improve the conduct and assessment of research and, ultimately, public understanding and trust.

URL : A framework for assessing the trustworthiness of scientific research findings

DOI : https://doi.org/10.1073/pnas.2536736123

Artificial intelligence in academic practices and policy discourses across ‘Big 5’ publishers

Authors :  Gergely Ferenc Lendvai, Aczél Petra

The present study investigates how the five largest academic publishers (Elsevier, Springer, Wiley, Taylor & Francis, and SAGE) are responding to the epistemic and procedural challenges posed by generative AI through formal policy frameworks.

Situated within ongoing debates about the boundaries of authorship and the governance of AI-generated content, our research aims to critically assess the discursive and regulatory contours of publishers’ authorship guidelines (PGs).

We employed a multi-method design that combines qualitative coding, semantic network analysis, and comparative matrix visualization to examine the official policy texts collected from each publisher’s website. Findings reveal a foundational consensus across all five publishers in prohibiting AI systems from being credited as authors and in mandating disclosure of AI usage.

However, beyond this shared baseline, marked divergences emerge in the scope, specificity, and normative framing of AI policies. Co-occurrence and semantic analyses underline the centrality of ‘authorship’, ‘ethics’, and ‘accountability’ in AI discourse. Structural similarity measures further reveal alignment among Wiley, Elsevier, and Taylor & Francis, with Springer as a clear outlier.

Our results point to an unsettled regulatory landscape where policies serve not only as instruments of governance but also as performative assertions of institutional identity and legitimacy.

Consequently, the fragmented field of PG highlights the need for harmonized, inclusive, and enforceable frameworks that recognize both the potential and risks of AI in scholarly communication.

URL : Artificial intelligence in academic practices and policy discourses across ‘Big 5’ publishers

DOI : https://doi.org/10.1093/reseval/rvag004

On the potential value conflict between scientific knowledge production and fair recognition of authorship

Authors : Gert Helgesson, William Bülow

The value of scientific knowledge and fairness in distribution of academic credit are core values in research publication. However, it is little discussed in the literature that these values may come into conflict, particularly in interdisciplinary research. The point of this paper is to acknowledge and describe the conflict and discuss potential solutions.

We use collaborations between pre-clinical (laboratory) researchers and clinicians at hospitals as an exemplifying case. We conclude that, without changing the preconditions for the value conflict, there is no general solution involving systematically prioritizing one value over the other.

However, a potential way out of the conflict would be a general shift from authorship to contributorship regarding evaluation of contributions, but required routines are presently not in place with most journals.

URL : On the potential value conflict between scientific knowledge production and fair recognition of authorship

DOI : https://doi.org/10.1080/08989621.2026.2623480

The ‘Big Three’ of Scientific Information: A comparative bibliometric review of Web of Science, Scopus, and OpenAlex

Authors : Daniel Torres-Salinas, Wenceslao Arroyo-Machado

The present comparative study examines the three main multidisciplinary bibliographic databases, Web of Science Core Collection, Scopus, and OpenAlex, with the aim of providing up-to-date evidence on coverage, metadata quality, and functional features to help inform strategic decisions in research assessment.

The report is structured into two complementary methodological sections. First, it presents a systematic review of recent scholarly literature that investigates record volume, open-access coverage, linguistic diversity, reference coverage, and metadata quality; this is followed by an original bibliometric analysis of the 2015-2024 period that explores longitudinal distribution, document types, thematic profiles, linguistic differences, and overlap between databases. The text concludes with a ten-point executive summary and five recommendations.

DOI : https://doi.org/10.48550/arXiv.2601.21908

How multilingual is scholarly communication? Mapping the global distribution of languages in publications and citations

Authors : Carolina PradierLucía CéspedesVincent Larivière

Language is a major source of systemic inequities in science, particularly among scholars whose first language is not English. Studies have examined scientists’ linguistic practices in specific contexts; few, however, have provided a global analysis of multilingualism in science.

Using two major bibliometric databases (OpenAlex and Dimensions), we provide a large-scale analysis of linguistic diversity in science, considering both the language of publications (N = 87,577,942) and of cited references (N = 1,480,570,087).

For the 1990–2023 period, we find that only Indonesian, Portuguese, and Spanish have expanded at a faster pace than English. Country-level analyses show that this trend is due to the growing strength of the Latin American and Indonesian academic circuits. Our results also confirm the same-language preference phenomenon (particularly for languages other than English), the strong connection between multilingualism and bibliodiversity, and that social sciences and humanities are the least English-dominated fields.

Our findings suggest that policies recognizing the value of both national-language and English-language publications have had a concrete impact on the distribution of languages in the global field of scholarly communication.

URL : How multilingual is scholarly communication? Mapping the global distribution of languages in publications and citations

DOI : https://doi.org/10.1002/asi.70055

The scholarly communication attitudes and behaviours of Gen – Z researchers: a pathfinding study

Authors : David Nicholas, David Clark, Abdullah Abrizah, Jorge Revez, Blanca Rodrí guez-Bravo, Marzena Swigon, John Akeroyd

In preparation for a major study of Generation–Z early career researchers’ (ECRs) scholarly communications attitudes and practices we report on how different Gen-Z researchers included in our earlier studies of ECRs were.

It is a qualitative, pilot study that covered a convenience sample of around 30 Gen-Z ECRs from 8 countries and all subjects and compared to 120 of their older colleagues. Conversational, in-depth interviews lasting an hour or more were the main form of data collection.

An AI analysis, employing Claude AI, was used both to provide an initial analysis of the data and also assess the published literature on the topic. The findings were that there is enough evidence to suggest that there are enough differences between Gen-Z and their Millennial colleagues – even though all are ECRs – to merit further research.

Younger researchers in particular appear to be strategically adopting AI for efficiency and career advancement, while older researchers possess heightened awareness, and caution, regarding the philosophical and ethical consequences of technological transformation in scholarly communication.

URL : The scholarly communication attitudes and behaviours of Gen – Z researchers: a pathfinding study

DOI : https://doi.org/10.33774/coe-2026-s8b36