Evaluating the linguistic coverage of OpenAlex: An assessment of metadata accuracy and completeness

Authors : Lucía Céspedes, Diego Kozlowski, Carolina Pradier, Maxime Holmberg Sainte-Marie, Natsumi Solange Shokida, Pierre Benz,
Constance Poitras, Anton Boudreau Ninkov, Saeideh Ebrahimy, Philips Ayeni, Sarra Filali, Bing Li, Vincent Larivière

Clarivate’s Web of Science (WoS) and Elsevier’s Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased toward English-language publications, underestimating the use of other languages in research dissemination.

Launched in 2022, OpenAlex promised comprehensive, inclusive, and open-source research information. While already in use by scholars and research institutions, the quality of its metadata is currently still being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex’s metadata related to language, through a comparison with WoS, as well as an in-depth manual validation of a sample of 6836 articles.

Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata are not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing, but more work is needed at infrastructural level to ensure the quality of metadata on language.

URL : Evaluating the linguistic coverage of OpenAlex: An assessment of metadata accuracy and completeness

DOI : https://doi.org/10.1002/asi.24979

Open Science at the University of Toronto. Exploration of Researcher, Staff and Librarian Perspectives

Authors : Madelin Burt-D’Agnillo, Mindy Thuna

Objective: The impetus for this project is to begin to understand open science practices and obstacles at the University of Toronto. This project uses open-ended questions to understand the ways in which university-affiliated individuals learn about, think about, and interact with open science. The goal of this study is to showcase the complexity and diversity of activity and challenges in this domain to help determine how best to move open science forward.

Methods: From March to October 2022, 45 semi-structured interviews were conducted with faculty, graduate students, librarians and administrative staff. Interviews were conducted and recorded using Zoom and the audio was transcribed using Otter.ai. As part of a commitment to open science practices, a data management plan was created and with participant consent, 26 transcripts were uploaded to Dataverse. Data analysis used structured coding and thematic development to investigate responses.

Results: The core finding of this study is that there is no singular status of open science at University of Toronto. The qualitative findings reflect a diversity of opinions, practices and relationships to open science.

Conclusion: For open science practices and scholarship to have longevity, there must be systemic changes to adopt more open activities. The University of Toronto is well positioned to guide the transition and harness open principles to move into the future.

URL : Open Science at the University of Toronto. Exploration of Researcher, Staff and Librarian Perspectives

DOI : https://doi.org/10.21083/partnership.v19i2.7847

Sustaining the “frozen footprints” of scholarly communication through open citations

Author : Zehra Taşkın

This review examines the role of open citations in fostering transparency, reproducibility, and accessibility in scholarly communication. Through a critical synthesis of diverse sources—articles, proceedings, presentations, datasets, and blog posts—it explores the motivations behind citing, the evolving meanings of citations, and key milestones in the open citation movement. Particular attention is given to initiatives like OpenCitations and the Initiative for Open Citations (I4OC), highlighting their contributions to advancing open scholarship.

Key findings indicate that open citations democratize research by providing free access to citation data, improving discoverability, and facilitating the creation of public citation graphs. Technological advancements, such as advanced data models and reference mining tools, have significantly contributed to the management and utilization of citation data. Despite these benefits, challenges such as ensuring data quality and standardization, addressing structural inequalities in citation networks, and achieving universal publisher adoption persist.

The study concludes with recommendations for future efforts, emphasizing policy advocacy, technological innovation, global collaboration, and educational initiatives to promote the widespread adoption and effective use of open citations. These strategies aim to make the “frozen footprints” of scholarly communication accessible to all, fostering a more equitable and transparent scientific landscape.

URL : Sustaining the “frozen footprints” of scholarly communication through open citations

DOI : https://doi.org/10.1002/asi.24982

‘As of my last knowledge update’: How is content generated by ChatGPT infiltrating scientific papers published in premier journals?

Author : Artur Strzelecki

The aim of this paper is to highlight the situation whereby content generated by the large language model ChatGPT is appearing in peer-reviewed papers in journals by recognized publishers. The paper demonstrates how to identify sections that indicate that a text fragment was generated, that is, entirely created, by ChatGPT. To prepare an illustrative compilation of papers that appear in journals indexed in the Web of Science and Scopus databases and possessing Impact Factor and CiteScore indicators, the SPAR4SLR method was used, which is mainly applied in systematic literature reviews.

Three main findings are presented: in highly regarded premier journals, articles appear that bear the hallmarks of the content generated by AI large language models, whose use was not declared by the authors (1); many of these identified papers are already receiving citations from other scientific works, also placed in journals found in scientific databases (2); and, most of the identified papers belong to the disciplines of medicine and computer science, but there are also articles that belong to disciplines such as environmental science, engineering, sociology, education, economics and management (3).

This paper aims to continue and add to the recently initiated discussion on the use of large language models like ChatGPT in the creation of scholarly works.

URL : ‘As of my last knowledge update’: How is content generated by ChatGPT infiltrating scientific papers published in premier journals?

DOI : https://doi.org/10.1002/leap.1650

Patent research in academic literature. Landscape and trends with a focus on patent analytics

Authors : Cristian Mejia, Yuya Kajikawa

Patent analytics is crucial for understanding innovation dynamics and technological trends. However, a comprehensive overview of this rapidly evolving field is lacking. This study presents a data-driven analysis of patent research, employing citation network analysis to categorize and examine research clusters. Here, we show that patent research is characterized by interconnected themes spanning fundamental patent systems, indicator development, methodological advancements, intellectual property management practices, and diverse applications.

We reveal central research areas in patent strategies, technological impact, and patent citation research while identifying emerging focuses on environmental sustainability and corporate innovation. The integration of advanced analytical techniques, including AI and machine learning, is observed across various domains. This study provides insights for researchers and practitioners, highlighting opportunities for cross-disciplinary collaboration and future research directions.

URL : Patent research in academic literature. Landscape and trends with a focus on patent analytics

DOI : https://doi.org/10.3389/frma.2024.1484685

A role for qualitative methods in researching Twitter data on a popular science article’s communication

Authors : Travis Noakes, Corrie Susanna Uys, Patricia Ann Harpur, Izak van Zyl

Big Data communication researchers have highlighted the need for qualitative analysis of online science conversations to better understand their meaning. However, a scholarly gap exists in exploring how qualitative methods can be applied to small data regarding micro-bloggers’ communications about science articles. While social media attention assists with article dissemination, qualitative research into the associated microblogging practices remains limited. To address these gaps, this study explores how qualitative analysis can enhance science communication studies on microblogging articles.

Calls for such qualitative approaches are supported by a practical example: an interdisciplinary team applied mixed methods to better understand the promotion of an unorthodox but popular science article on Twitter over a 2-year period. While Big Data studies typically identify patterns in microbloggers’ activities from large data sets, this study demonstrates the value of integrating qualitative analysis to deepen understanding of these interactions. In this study, a small data set was analyzed using NVivo™ by a pragmatist and MAXQDA™ by a statistician.

The pragmatist’s multimodal content analysis found that health professionals shared links to the article, with its popularity tied to its role as a communication event within a longstanding debate in the health sciences. Dissident professionals used this article to support an emergent paradigm. The analysis also uncovered practices, such as language localization, where a title was translated from English to Spanish to reach broader audiences.

A semantic network analysis confirmed that terms used by the article’s tweeters strongly aligned with its content, and the discussion was notably pro-social. Meta-inferences were then drawn by integrating the findings from the two methods. These flagged the significance of contextualizing the sharing of a health science article in relation to tweeters’ professional identities and their stances on health-related issues. In addition, meta-critiques highlighted challenges in preparing accurate tweet data and analyzing them using qualitative data analysis software. These findings highlight the valuable contributions that qualitative research can make to research involving microblogging data in science communication. Future research could critique this approach or further explore the microblogging of key articles within important scientific debates.

URL : A role for qualitative methods in researching Twitter data on a popular science article’s communication

DOI : https://doi.org/10.3389/frma.2024.1431298