Authors : Mohammad Hosseini, Brian D Earp, Sebastian Porsdam Mann, Kristi Holmes
As generative artificial intelligence (GenAI) revolutionizes how research is conducted, it also challenges traditional methods of scholarly evaluation. Productivity metrics such as publication and citation counts are widely understood to be poor proxies for gauging meaningful impact. These metrics are becoming even less reliable as GenAI accelerates text-based and computational work while leaving other forms of research labor (e.g. community engagement, in-person mentorship and team development) largely unaffected. This uneven effect risks exacerbating existing evaluative biases.
We argue that evaluation reforms should be organized around two categories of ‘distinctly human contributions’ that are indispensable to research, but which are inadequately captured by metrics: (1) the epistemic-ethical category, encompassing situated judgment under accountability (e.g. deciding what to trust, justifying that decision, and standing behind it); and (2) the socio-relational category, encompassing sustained forms of valuable human engagement (e.g. mentoring, teaching, community partnership and trust-building).
We suggest practical mechanisms for supporting evaluation reform including modified CRediT (Contributor Role Taxonomy) statements, recognition of a broader array of outputs, and strengthened narrative CVs and third-person testimonies.
However, we acknowledge that these suggestions, particularly those relying on narrative self-presentation, are themselves vulnerable to GenAI manipulation and are insufficient on their own. If distinctly human contributions to research require judgment and relationships that resist automation, then evaluation cannot be reduced to instruments designed to minimize human evaluative effort.
GenAI, therefore, does not require entirely new systems of evaluation. Rather, it increases the cost of avoiding what good and ethically sound performance evaluation has always required.