Scholar Metrics Scraper (SMS): automated retrieval of citation and author data

Authors : Yutong Cao, Nicole A. Cheung, Dean Giustini, Jeffrey LeDue, Timothy H. Murphy

Academic departments, research clusters and evaluators analyze author and citation data to measure research impact and to support strategic planning. We created Scholar Metrics Scraper (SMS) to automate the retrieval of bibliometric data for a group of researchers.

The project contains Jupyter notebooks that take a list of researchers as an input and exports a CSV file of citation metrics from Google Scholar (GS) to visualize the group’s impact and collaboration. A series of graph outputs are also available. SMS is an open solution for automating the retrieval and visualization of citation data.

Bibliometric analysis of Sci-Hub downloads by Egyptian researchers

Authors : Ismail Ragab Osman, Hendy Abdullah Hendy Ahmed

In this study we present an in-depth bibliometric analysis of Sci-Hub downloads by Egyptian researchers based on the 2017 download log file. The study reveals that Egyptian researchers heavily rely on Sci-Hub, generating a substantial 1,357,526 download requests in 2017, with 65% of these occurring outside regular working hours. Cairo emerges as a central hub for this activity, contributing 81.58% of total downloads.

Journal articles constitute the majority of downloads at 82.36%, followed by conference papers (12.89%). A discernible trend shows a preference for recent papers published between 2012 and 2017, highlighting the demand for up-to-date research. The analysis also highlights prominent publishers, including IEEE, Elsevier, Wiley, and Springer, as preferred sources for Egyptian researchers. “Journal of the American Chemical Society” and “Journal of Applied Physics” stand out among accessed journals, while IEEE-associated conferences, notably “IEEE Power and Energy Society General Meeting,” dominate conference paper downloads. Examining journal accessibility via the Egyptian Knowledge Bank (EKB) reveals that 62.84% of journals are accessible, with Science Direct as the leading provider (28.37%).

However, a significant gap emerges as 87.39% of downloaded conference papers remain inaccessible through EKB. Furthermore, a semantic analysis highlights recurring themes such as “systems,” “review,” “analysis,” “treatment,” “power,” and “energy,” reflecting the key research areas of Egyptian researchers. Overall, this study offers valuable insights into Sci-Hub’s role in supplementing Egyptian researchers’ resource access and underscores the need for comprehensive resource coverage and accessibility enhancements.

The Nexus of Open Science and Innovation: Insights from Patent Citations

Author : Abdelghani Maddi

This paper aims to analyze the extent to which inventive activity relies on open science. In other words, it investigates whether inventors utilize Open Access (OA) publications more than subscription-based ones, especially given that some inventors may lack institutional access.

To achieve this, we utilized the (Marx, 2023) database, which contains citations of patents to scientific publications (Non-Patent References-NPRs). We focused on publications closely related to invention, specifically those cited solely by inventors within the body of patent texts. Our dataset was supplemented by OpenAlex data.

The final sample comprised 961,104 publications cited in patents, of which 861,720 had a DOI. Results indicate that across all disciplines, OA publications are 38% more prevalent in patent citations (NPRs) than in the overall OpenAlex database.

In biology and medicine, inventors use 73% and 27% more OA publications, respectively, compared to closed-access ones. Chemistry and computer science are also disciplines where OA publications are more frequently utilized in patent contexts than subscription-based ones.


Beyond journals and peer review: towards a more flexible ecosystem for scholarly communication

Author : Michael Wood

This article challenges the assumption that journals and peer review are essential for developing,evaluating and disseminating scientific and other academic knowledge. It suggests a more flexible ecosystem, and examines some of the possibilities this might facilitate. The market for academic outputs should be opened up by encouraging the separation of the dissemination service from the evaluation service.

Publishing research in subject-specific journals encourages compartmentalising research into rigid categories. The dissemination of knowledge would be better served by an open access, web-based repository system encompassing all disciplines. There would then be a role for organisations to assess the items in this repository to help users find relevant, high-quality work.

There could be a variety of such organisations which could enable reviews from peers to be supplemented with evaluation by non-peers from a variety of different perspectives: user reviews, statistical reviews, reviews from the perspective of different disciplines, and so on. This should reduce the inevitably conservative influence of relying on two or three peers, and make the evaluation system more critical, multi-dimensional and responsive to the requirements of different audience groups, changing circumstances, and new ideas.

Non-peer review might make it easier to challenge dominant paradigms, and expanding the potential audience beyond a narrow group of peers might encourage the criterion of simplicity to be taken more seriously – which is essential if human knowledge is to continue to progress.

Science’s greatest discoverers: a shift towards greater interdisciplinarity, top universities and older age

Author : Alexander Krauss

What are the unique features and characteristics of the scientists who have made the greatest discoveries in science? To address this question, we assess all major scientific discoverers, defined as all nobel-prize and major non-nobel-prize discoverers, and their demographic, institutional and economic traits.

What emerges is a general profile of the scientists who have driven over 750 of science’s greatest advances. We find that interdisciplinary scientists who completed two or more degrees in different academic fields by the time of discovery made about half—54%—of all nobel-prize discoveries and 42% of major non-nobel-prize discoveries over the same period; this enables greater interdisciplinary methodological training for making new scientific achievements.

Science is also becoming increasingly elitist, with scientists at the top 25 ranked universities accounting for 30% of both all nobel-prize and non-nobel-prize discoveries. Scientists over the age of 50 made only 7% of all nobel-prize discoveries and 15% of non-nobel-prize discoveries and those over the age of 60 made only 1% and 3%, respectively. The gap in years between making nobel-prize discoveries and receiving the award is also increasing over time across scientific fields—illustrating that it is taking longer to recognise and select major breakthroughs.

Overall, we find that those who make major discoveries are increasingly interdisciplinary, older and at top universities. We also assess here the role and distribution of factors like geographic location, gender, religious affiliation and country conditions of these leading scientists, and how these factors vary across time and scientific fields.

The findings suggest that more discoveries could be made if science agencies and research institutions provide greater incentives for researchers to work against the common trend of narrow specialisation and instead foster interdisciplinary research that combines novel methods across fields.

Dataset Artefacts are the Hidden Drivers of the Declining Disruptiveness in Science

Authors : Vincent Holst, Andres Algaba, Floriano Tori, Sylvia Wenmackers, Vincent Ginis

Park et al. [1] reported a decline in the disruptiveness of scientific and technological knowledge over time. Their main finding is based on the computation of CD indices, a measure of disruption in citation networks [2], across almost 45 million papers and 3.9 million patents.

Due to a factual plotting mistake, database entries with zero references were omitted in the CD index distributions, hiding a large number of outliers with a maximum CD index of one, while keeping them in the analysis [1]. Our reanalysis shows that the reported decline in disruptiveness can be attributed to a relative decline of these database entries with zero references. Notably, this was not caught by the robustness checks included in the manuscript.

The regression adjustment fails to control for the hidden outliers as they correspond to a discontinuity in the CD index. Proper evaluation of the Monte-Carlo simulations reveals that, because of the preservation of the hidden outliers, even random citation behaviour replicates the observed decline in disruptiveness.

Finally, while these papers and patents with supposedly zero references are the hidden drivers of the reported decline, their source documents predominantly do make references, exposing them as pure dataset artefacts.

From Data Creator to Data Reuser: Distance Matters

Authors : Christine L. Borgman, Paul T. Groth

Sharing research data is complex, labor-intensive, expensive, and requires infrastructure investments by multiple stakeholders. Open science policies focus on data release rather than on data reuse, yet reuse is also difficult, expensive, and may never occur. Investments in data management could be made more wisely by considering who might reuse data, how, why, for what purposes, and when.

Data creators cannot anticipate all possible reuses or reusers; our goal is to identify factors that may aid stakeholders in deciding how to invest in research data, how to identify potential reuses and reusers, and how to improve data exchange processes.

Drawing upon empirical studies of data sharing and reuse, we develop the theoretical construct of distance between data creator and data reuser, identifying six distance dimensions that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality.

These dimensions are primarily social in character, with associated technical aspects that can decrease – or increase – distances between creators and reusers. We identify the order of expected influence on data reuse and ways in which the six dimensions are interdependent.

Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies.

