Generative AI and the future of scientometrics: current topics and future questions

Authors : Benedetto Lepori, Jens Peter Andersen, Karsten Donnay

The aim of this paper is to review the use of GenAI in scientometrics, and to begin a debate on the broader implications for the field. First, we provide an introduction on GenAI’s generative and probabilistic nature as rooted in distributional linguistics.

And we relate this to the debate on the extent to which GenAI might be able to mimic human ‘reasoning’. Second, we leverage this distinction for a critical engagement with recent experiments using GenAI in scientometrics, including topic labelling, the analysis of citation contexts, predictive applications, scholars’ profiling, and research assessment.

GenAI shows promise in tasks where language generation dominates, such as labelling, but faces limitations in tasks that require stable semantics, pragmatic reasoning, or structured domain knowledge. However, these results might become quickly outdated. Our recommendation is, therefore, to always strive to systematically compare the performance of different GenAI models for specific tasks.

Third, we inquire whether, by generating large amounts of scientific language, GenAI might have a fundamental impact on our field by affecting textual characteristics used to measure science, such as authors, words, and references. We argue that careful empirical work and theoretical reflection will be essential to remain capable of interpreting the evolving patterns of knowledge production.

DOI : https://doi.org/10.48550/arXiv.2507.00783

 

Scholarly publishing’s hidden diversity: How exclusive databases sustain the oligopoly of academic publishers

Authors : Simon van Bellen, Juan Pablo Alperin, Vincent Larivière

Global scholarly publishing has been dominated by a small number of publishers for several decades. This paper revisits the data on corporate control of scholarly publishing by analyzing the relative shares of scholarly journals and articles published by the major publishers and the “long tail” of smaller, independent publishers, using Dimensions and Web of Science (WoS).

The reduction of expenses for printing and distribution and the availability of open-source journal management tools may have contributed to the emergence of small publishers, while recently developed inclusive databases may allow for the study of these. Dimensions’ inclusive indexing revealed the number of scholarly journals and articles published by smaller publishers has been growing rapidly, especially since the onset of large-scale online publishing around 2000, resulting in a higher share of articles from smaller publishers.

In parallel, WoS shows increasing concentration within a few corporate publishers. For the 1980–2021 period, we retrieved 32% more articles from Dimensions compared to the more selective WoS.

Dimensions’ data showed the expansion of small publishers was most pronounced in the Social Sciences and the Arts and Humanities, but a similar trend is observed in the Natural Sciences and Engineering, and the Health Sciences. A major geographical divergence is also revealed, with English-speaking countries and/or those located in northwestern Europe relying heavily on major publishers for the dissemination of their research, while the rest of the world being relatively independent of the oligopoly.

Finally, independent journals publish more often in open access in general, and in Diamond open access in particular. We conclude that enhanced indexing and visibility of recently created, independent journals may favour their growth and stimulate global scholarly bibliodiversity.

URL : Scholarly publishing’s hidden diversity: How exclusive databases sustain the oligopoly of academic publishers

DOI : https://doi.org/10.1371/journal.pone.0327015

Peer Review as Structured Commentary: Immutable Identity, Public Dialogue, and Reproducible Scholarship

Author : Craig Steven Wright

This paper reconceptualises peer review as structured public commentary. Traditional academic validation is hindered by anonymity, latency, and gatekeeping. We propose a transparent, identity-linked, and reproducible system of scholarly evaluation anchored in open commentary.

Leveraging blockchain for immutable audit trails and AI for iterative synthesis, we design a framework that incentivises intellectual contribution, captures epistemic evolution, and enables traceable reputational dynamics.

This model empowers fields from computational science to the humanities, reframing academic knowledge as a living process rather than a static credential.

DOI : https://doi.org/10.48550/arXiv.2506.22497

Who funds what: An assessment of research funding networks in data papers

Authors : Yurdagül Ünal, Müge Akbulut

This study examines the role of funding collaborations in shaping the production and dissemination of scientific information through data papers, a rapidly growing academic publication format.

To the best of our knowledge, there are no studies investigating, and evaluating the data paper-funder relationship. The goal of this study was, therefore, to evaluate data papers and funder information in detail, extracted from the data papers themselves, in order to reveal the collaborative characteristics of funders, and to provide guidance to researchers and funding agencies.

Data papers published between 2006–2017 were downloaded from the Web of Science database. The same papers were retrieved from Dimension, which offered more detailed category classifications. These classifications were then utilized for further analysis based on categories. The names of funders were standardized by matching them using the Crossref funder registry, and associated funding metadata.

A statistical, and social network analysis were performed. The top funding country was the USA; the top funding institution was the U.S. Department of Health and Human Services, National Institutes of Health. The collaboration network among funders exhibited relatively low density.

A collaboration network of 1197 links between 69 countries was created. The USA had connections with 62 countries. Our study is important because it standardizes the funding data for data papers by associating them with Crossref funding metadata.

The widespread increase of data papers, and their relatively dispersed funding among a variety of funders points to the need for research evaluating collaborations between funders, as important both for the funded researchers, and for understanding and optimizing the shortcomings of current funding management.

URL : Who funds what: An assessment of research funding networks in data papers

DOI : https://doi.org/10.1177/02666669251352185

The role of preprints in open science: Accelerating knowledge transfer from science to technology

Authors : Zhiqi Wang, Yue Chen, Chun Yang

Preprints have become increasingly essential in the landscape of open science, facilitating not only the exchange of knowledge within the scientific community but also bridging the gap between science and technology.

However, the impact of preprints on technological innovation, given their unreviewed nature, remains unclear. This study fills this gap by conducting a comprehensive scientometric analysis of patent citations to bioRxiv preprints submitted between 2013 and 2021, measuring and accessing the contribution of preprints in accelerating knowledge transfer from science to technology.

Our findings reveal a growing trend of patent citations to bioRxiv preprints, with a notable surge in 2020, primarily driven by the COVID-19 pandemic. Preprints play a critical role in accelerating innovation, not only expedite the dissemination of scientific knowledge into technological innovation but also enhance the visibility of early research results in the patenting process, while journals remain essential for academic rigor and reliability. w

The substantial number of post-online-publication patent citations highlights the critical role of the open science model-particularly the « open access » effect of preprints-in amplifying the impact of science on technological innovation.

This study provides empirical evidence that open science policies encouraging the early sharing of research outputs, such as preprints, contribute to more efficient linkage between science and technology, suggesting an acceleration in the pace of innovation, higher innovation quality, and economic benefits.

DOI : https://doi.org/10.48550/arXiv.2506.20225

Open Licensing Models in the Cultural Heritage Sector

Authors :  Bartolomeo Meletti, Kristofer Erickson, Aline Iramina, Victoria Stobo

This document reports on a study of open licensing practices among cultural heritage institutions (CHIs) carried out by researchers in the CREATe Centre at the University of Glasgow and the Centre for Archive Studies at the University of Liverpool.

The purpose of this study – funded by Creative Commons – is to advance understanding of how open licensing is being used in CHIs in practice and to enable information sharing about potential strategies. The authors do not endorse any singular approach – the findings reflect responses by a wide range of institutions in their own local contexts.

URL : Open Licensing Models in the Cultural Heritage Sector

DOI : https://doi.org/10.5281/zenodo.15691432