Generative AI can and should accelerate research evaluation reform to better recognize ‘distinctly human contributions’

Authors :  Mohammad Hosseini, Brian D Earp, Sebastian Porsdam Mann, Kristi Holmes

As generative artificial intelligence (GenAI) revolutionizes how research is conducted, it also challenges traditional methods of scholarly evaluation. Productivity metrics such as publication and citation counts are widely understood to be poor proxies for gauging meaningful impact. These metrics are becoming even less reliable as GenAI accelerates text-based and computational work while leaving other forms of research labor (e.g. community engagement, in-person mentorship and team development) largely unaffected. This uneven effect risks exacerbating existing evaluative biases.

We argue that evaluation reforms should be organized around two categories of ‘distinctly human contributions’ that are indispensable to research, but which are inadequately captured by metrics: (1) the epistemic-ethical category, encompassing situated judgment under accountability (e.g. deciding what to trust, justifying that decision, and standing behind it); and (2) the socio-relational category, encompassing sustained forms of valuable human engagement (e.g. mentoring, teaching, community partnership and trust-building).

We suggest practical mechanisms for supporting evaluation reform including modified CRediT (Contributor Role Taxonomy) statements, recognition of a broader array of outputs, and strengthened narrative CVs and third-person testimonies.

However, we acknowledge that these suggestions, particularly those relying on narrative self-presentation, are themselves vulnerable to GenAI manipulation and are insufficient on their own. If distinctly human contributions to research require judgment and relationships that resist automation, then evaluation cannot be reduced to instruments designed to minimize human evaluative effort.

GenAI, therefore, does not require entirely new systems of evaluation. Rather, it increases the cost of avoiding what good and ethically sound performance evaluation has always required.

URL : Generative AI can and should accelerate research evaluation reform to better recognize ‘distinctly human contributions’

DOI : https://doi.org/10.1093/reseval/rvag020

Diverse roles of twitter in research evaluation: original tweets and retweets capture different types of engagements with scholarly articles

Authors :  Ashraf Maleki, Kim Holmberg

Altmetrics need to be more critically assessed in terms of the extent to which they reflect impact and quality of research compared to popularity or mere attention. Twitter (now rebranded as X) is a popular platform to, among other things, discuss and share scientific articles.

Earlier altmetric studies have often focused on investigating whether the number of tweets mentioning scientific articles could be used as an indicator of scientific impact or attention, with results showing weak to moderate correlations with citation counts. But all tweets may not be equal, as original tweets and retweets may reflect different levels of engagement and impact. Using a dataset of over 330,000 PLOS publications, this study explores whether these two forms of Twitter activity correlate differently with traditional citation metrics and how these relationships vary across disciplines.

The findings showed the correlation between citations and original tweets was consistently higher than that between citations and retweets and significant weak or moderate, but higher in Social Science and Humanities than in Natural Science, Engineering and Medicine fields. Also, including zero citation counts improved the correlation coefficients for original tweets, but reduced that of retweets.

This indicates that original tweets may be more aligned with citation counts as an indicator of scholarly impact, whereas retweets might reflect broader dissemination and popularity. In conclusion, tweets and retweets are different altmetric indicators and should be considered as two different metrics and analysed separately.

URL : Diverse roles of twitter in research evaluation: original tweets and retweets capture different types of engagements with scholarly articles

DOI : https://doi.org/10.1093/reseval/rvag014

Evaluating Open Access Advantages for Citations and Altmetrics (2011-21): A Dynamic and Evolving Relationship

Author : Mike Taylor

Differences between the impacts of Open Access (OA) and non-OA research have been observed over a range of citation and altmetric indicators, usually finding an Open Access Advantage (OAA). However, science-wide analyses covering multiple years, indicators and disciplines are lacking. Using citations and six altmetrics for 33.3M articles published 2011-21, we compare OA and non-OA papers.

The results show that there is no universal OAA across all disciplines or impact indicators: the OAA for citations tends to be lower for recent papers, whereas the OAAs for news, blogs and Twitter are consistent across years and unrelated to volume of OA publications. Wikipedia OAAs are consistently pronounced for all subjects except Humanities (HU) and Social Sciences. Patent OAAs for are strongest for Medical & Health Sciences (MHS) and Life Sciences (LS).

Uniquely, the OAAs for Policy citations is stronger for recently published research. These results support different hypotheses for different subjects and indicators. The evidence is consistent with OA accelerating research impact in MHS, LS and HU; increased visibility/discoverability being a factor in promoting the socio-economic impact; and that OA is a factor in growing online engagement with research. OAAs are therefore complex, dynamic, multi-factorial and require considerable analysis to understand.

URL : Evaluating Open Access Advantages for Citations and Altmetrics (2011-21): A Dynamic and Evolving Relationship

DOI : Serendipity and Scientific Styles: An Ordinary

The independence paradox in scientific careers

Authors : Yanmeng Xing, Ye Sun, Tongxin Pan, Giacomo Livan, Yifang Ma

Establishing an independent academic identity is a central yet insufficiently understood challenge for early-career researchers. However, limited resources and mentor-driven research agendas often constrain early efforts toward autonomy.

To provide large-scale quantitative evidence on how junior researchers develop independence, we introduce a framework that traces how mentees diverge from their mentors in both research topics and collaboration networks, and how these divergences relate to long-term scientific impact.

Analyzing over 500,000 mentee-mentor pairs in Chemistry, Neuroscience, and Physics across six decades, we find that high-impact scientists often initiate work in secondary areas of their mentors’ expertise while adaptively establishing distinct research trajectories. This pattern is most pronounced among mentees who eventually surpass their mentors’ impact.

We identify an inverted U-shaped relationship between topic divergence and mentees’ enduring impact, with moderate divergence yielding the highest scientific impact, revealing an independence paradox in scientific careers.

This pattern holds whether topic divergence is measured by citation network or semantic thematic distance. We further reveal that excessive direct mentor-mentee collaborations correlate with lower mentee impact, whereas expanding professional networks to include mentors’ collaborators is beneficial.

These findings not only offer actionable guidance for early-career researchers navigating independence but also inform institutional policies that promote mentorship structures supporting intellectual innovation and recognizing original contributions in promotion evaluations.

DOI : https://doi.org/10.48550/arXiv.2408.16992

Determining quality dimensions for peer review reports using a Delphi approach

Authors : Amanda Sizo, Adriano Lino, Álvaro Rocha, Luis Paulo Reis

The quality of peer review reports is essential to the integrity and effectiveness of scholarly communication. Yet review reports are often criticized for being vague, biased, or unconstructive, which limits their usefulness for both authors and editors. Existing frameworks for assessing review quality remain fragmented and are rarely validated through expert consensus.

This study aims to define and validate a comprehensive set of quality dimensions for peer review reports, encompassing comments addressed to both authors and editors. We employed a two-phase design combining a thematic analysis of the literature with a Delphi study involving 43 scientific editors, primarily from journals in Computer Science and Engineering.

Consensus was reached after two Delphi rounds, resulting in 62 validated statements organized into eight quality dimensions: Helpfulness, Specificity, Fairness, Thoroughness, Courteousness, Readability, Consistency, and Relevance. These findings provide an empirically grounded framework to inform the development of clearer standards for peer review practice.

URL : Determining quality dimensions for peer review reports using a Delphi approach

DOI : https://doi.org/10.1007/s11192-026-05603-3

AI And the Editors’ Ghost: Who Is the Writer Now?

Authors : David Clark, David Nicholas, Abdullah Abrizah, John Akeroyd, Jorge Revez, Blanca Rodríguez-Bravo, Marzena Swigon, Tatyana Polezhaeva, Anne Gere, Eti Herman

This an exploration of the use of AI in research and writing. It builds upon the ‘Harbingers’ project, an international and longitudinal study of early career researchers (ECRs) and scholarly communication.

In the fourth phase of the project, we returned to the theme of AI, in particular AI as ‘ghostwriter’. Our sources are transcripts of conversational, open-form interviews with over 60 ECRs from Britain, Malaysia, Poland, Portugal, Spain, Russia, and other countries.

For an initial analysis of the transcripts, we used Google NotebookLM. An overarching and thematic summary of the data was produced in minutes, that would otherwise have occupied our research team for weeks. The unprompted text, immediately plausible and coherent, was regarded by all national interviewers as impressive.

Here, using a relatively small, convenience sample, we compare the AI generated summaries both against our original data and those first impressions. We reflect upon our own experience of using AI and that of our interviewees.

This paper is about how we used AI as an experiment, our reaction to it, how that chimes, resonates, echoes the experiences of the ECRs. It is a calibration for our future data analysis.

URL : Learned Publishing – 2026 – Clark – AI And the Editors Ghost Who Is the Writer Now

DOI : https://doi.org/10.1002/leap.2051