Evaluating Research Quality with Large Language Models: An Analysis of ChatGPT’s Effectiveness with Different Settings and Inputs

Author : Mike Thelwall

Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises, appointments and promotion. It is therefore important to investigate whether Large Language Models (LLMs) can play a role in this process.

This article assesses which ChatGPT inputs (full text without tables, figures and references; title and abstract; title only) produce better quality score estimates, and the extent to which scores are affected by ChatGPT models and system prompts.

The results show that the optimal input is the article title and abstract, with average ChatGPT scores based on these (30 iterations on a dataset of 51 papers) correlating at 0.67 with human scores, the highest ever reported. ChatGPT 4o is slightly better than 3.5-turbo (0.66), and 4o-mini (0.66).

The results suggest that article full texts might confuse LLM research quality evaluations, even though complex system instructions for the task are more effective than simple ones.

Thus, whilst abstracts contain insufficient information for a thorough assessment of rigour, they may contain strong pointers about originality and significance. Finally, linear regression can be used to convert the model scores into the human scale scores, which is 31% more accurate than guessing.

Arxiv : https://arxiv.org/abs/2408.06752

Global insights: ChatGPT’s influence on academic and research writing, creativity, and plagiarism policies

Authors : Muhammad Abid Malik, Amjad Islam Amjad, Sarfraz Aslam, Abdulnaser Fakhrou

Introduction: The current study explored the influence of Chat Generative Pre-Trained Transformer (ChatGPT) on the concepts, parameters, policies, and practices of creativity and plagiarism in academic and research writing.

Methods: Data were collected from 10 researchers from 10 different countries (Australia, China, the UK, Brazil, Pakistan, Bangladesh, Iran, Nigeria, Trinidad and Tobago, and Turkiye) using semi-structured interviews. NVivo was employed for data analysis.

Results: Based on the responses, five themes about the influence of ChatGPT on academic and research writing were generated, i.e., opportunity, human assistance, thought-provoking, time-saving, and negative attitude. Although the researchers were mostly positive about it, some feared it would degrade their writing skills and lead to plagiarism. Many of them believed that ChatGPT would redefine the concepts, parameters, and practices of creativity and plagiarism.

Discussion: Creativity may no longer be restricted to the ability to write, but also to use ChatGPT or other large language models (LLMs) to write creatively. Some suggested that machine-generated text might be accepted as the new norm; however, using it without proper acknowledgment would be considered plagiarism. The researchers recommended allowing ChatGPT for academic and research writing; however, they strongly advised it to be regulated with limited use and proper acknowledgment.

URL : Global insights: ChatGPT’s influence on academic and research writing, creativity, and plagiarism policies

DOI : https://doi.org/10.3389/frma.2024.1486832

The use of ChatGPT for identifying disruptive papers in science: a first exploration

Authors : Lutz Bornmann, Lingfei Wu, Christoph Ettl

ChatGPT has arrived in quantitative research evaluation. With the exploration in this Letter to the Editor, we would like to widen the spectrum of the possible use of ChatGPT in bibliometrics by applying it to identify disruptive papers.

The identification of disruptive papers using publication and citation counts has become a popular topic in scientometrics. The disadvantage of the quantitative approach is its complexity in the computation. The use of ChatGPT might be an easy to use alternative.

URL : The use of ChatGPT for identifying disruptive papers in science: a first exploration

DOI : https://doi.org/10.1007/s11192-024-05176-z

Academic writing in the age of AI: Comparing the reliability of ChatGPT and Bard with Scopus and Web of Science

Authors : Swati Garg, Asad Ahmad, Dag Øivind Madsen

ChatGPT and Bard (now known as Gemini) are becoming indispensable resources for researchers, academicians and diverse stakeholders within the academic landscape. At the same time, traditional digital tools such as scholarly databases continue to be widely used. Web of Science and Scopus are the most extensive academic databases and are generally regarded as consistently reliable scholarly research resources. With the increasing acceptance of artificial intelligence (AI) in academic writing, this study focuses on understanding the reliability of the new AI models compared to Scopus and Web of Science.

The study includes a bibliometric analysis of green, sustainable and ecological buying behaviour, covering the period from 1 January 2011 to 21 May 2023. These results are used to compare the results from the AI and the traditional scholarly databases on several parameters. Overall, the findings suggest that AI models like ChatGPT and Bard are not yet reliable for academic writing tasks. It appears to be too early to depend on AI for such tasks.

URL : Academic writing in the age of AI: Comparing the reliability of ChatGPT and Bard with Scopus and Web of Science

DOI : https://doi.org/10.1016/j.jik.2024.100563

FAIR GPT: A virtual consultant for research data management in ChatGPT

Authors : Renat Shigapov, Irene Schumm

FAIR GPT is a first virtual consultant in ChatGPT designed to help researchers and organizations make their data and metadata compliant with the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It provides guidance on metadata improvement, dataset organization, and repository selection.

To ensure accuracy, FAIR GPT uses external APIs to assess dataset FAIRness, retrieve controlled vocabularies, and recommend repositories, minimizing hallucination and improving precision. It also assists in creating documentation (data and software management plans, README files, and codebooks), and selecting proper licenses. This paper describes its features, applications, and limitations.

Arxiv : https://arxiv.org/abs/2410.07108

Where there’s a will there’s a way: ChatGPT is used more for science in countries where it is prohibited

Authors : Honglin Bao, Mengyi Sun, Misha Teplitskiy

Regulating AI has emerged as a key societal challenge, but which methods of regulation are effective is unclear. Here, we measure the effectiveness of restricting AI services geographically using the case of ChatGPT and science. OpenAI prohibits access to ChatGPT from several countries including China and Russia.

If the restrictions are effective, there should be minimal use of ChatGPT in prohibited countries. We measured use by developing a classifier based on prior work showing that early versions of ChatGPT overrepresented distinctive words like “delve.”

We trained the classifier on abstracts before and after ChatGPT “polishing” and validated it on held-out abstracts and those where authors self-declared to have used AI, where it substantially outperformed off-the-shelf LLM detectors GPTZero and ZeroGPT. Applying the classifier to preprints from Arxiv, BioRxiv, and MedRxiv reveals that ChatGPT was used in approximately 12.6% of preprints by August 2023 and use was 7.7% higher in countries without legal access.

Crucially, these patterns appeared before the first major legal LLM became widely available in China, the largest restricted-country preprint producer. ChatGPT use was associated with higher views and downloads, but not citations or journal placement.

Overall, restricting ChatGPT geographically has proven ineffective in science and possibly other domains, likely due to widespread workarounds.

URL : https://arxiv.org/abs/2406.11583

Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest

Author : Walid Hariri

Scientific articles play a crucial role in advancing knowledge and informing research directions. One key aspect of evaluating scientific articles is the analysis of citations, which provides insights into the impact and reception of the cited works. This article introduces the innovative use of large language models, particularly ChatGPT, for comprehensive sentiment analysis of citations within scientific articles.

By leveraging advanced natural language processing (NLP) techniques, ChatGPT can discern the nuanced positivity or negativity of citations, offering insights into the reception and impact of cited works. Furthermore, ChatGPT’s capabilities extend to detecting potential biases and conflicts of interest in citations, enhancing the objectivity and reliability of scientific literature evaluation.

This study showcases the transformative potential of artificial intelligence (AI)-powered tools in enhancing citation analysis and promoting integrity in scholarly research.

Arxiv : https://arxiv.org/abs/2404.01800