Prevalence of nonsensical algorithmically generated papers in the scientific literature

Authors : Guillaume Cabanac, Cyril Labbé

In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow-up retractions.

No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2-fold.

First, we designed a detector that combs the scientific literature for grammar-based computer-generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen-papers from 19 publishers.

We estimate the prevalence of SCIgen-papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34).

Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references.

It stresses the need to screen papers for nonsense before peer-review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.

URL : Prevalence of nonsensical algorithmically generated papers in the scientific literature


Digital Object Identifier (DOI) Under the Context of Research Data Librarianship

AuthorJia Liu

A digital object identifier (DOI) is an increasingly prominent persistent identifier in finding and accessing scholarly information. This paper intends to present an overview of global development and approaches in the field of DOI and DOI services with a slight geographical focus on Germany.

At first, the initiation and components of the DOI system and the structure of a DOI name are explored. Next, the fundamental and specific characteristics of DOIs are described and DOIs for three (3) kinds of typical intellectual entities in the scholar communication are dealt with; then, a general DOI service pyramid is sketched with brief descriptions of functions of institutions at different levels.

After that, approaches of the research data librarianship community in the field of RDM, especially DOI services, are elaborated. As examples, the DOI services provided in German research libraries as well as best practices of DOI services in a German library are introduced; and finally, the current practices and some issues dealing with DOIs are summarized. It is foreseeable that DOI, which is crucial to FAIR research data, will gain extensive recognition in the scientific world.

URL : Digital Object Identifier (DOI) Under the Context of Research Data Librarianship


Open access book usage data – how close is COUNTER to the other kind?

Author : Ronald Snijder

In April 2020, the OAPEN Library moved to a new platform, based on DSpace 6. During the same period, IRUS-UK started working on the deployment of Release 5 of the COUNTER Code of Practice (R5). This is, therefore, a good moment to compare two widely used usage metrics – R5 and Google Analytics (GA).

This article discusses the download data of close to 11,000 books and chapters from the OAPEN Library, from the period 15 April 2020 to 31 July 2020. When a book or chapter is downloaded, it is logged by GA and at the same time a signal is sent to IRUS-UK.

This results in two datasets: the monthly downloads measured in GA and the usage reported by R5, also clustered by month. The number of downloads reported by GA is considerably larger than R5. The total number of downloads in GA for the period is over 3.6 million.

In contrast, the amount reported by R5 is 1.5 million, around 400,000 downloads per month. Contrasting R5 and GA data on a country-by-country basis shows significant differences. GA lists more than five times the number of downloads for several countries, although the totals for other countries are about the same.

When looking at individual tiles, of the 500 highest ranked titles in GA that are also part of the 1,000 highest ranked titles in R5, only 6% of the titles are relatively close together. The choice of metric service has considerable consequences on what is reported.

Thus, drawing conclusions about the results should be done with care. One metric is not better than the other, but we should be open about the choices made. After all, open access book metrics are complicated, and we can only benefit from clarity.

URL : Open access book usage data – how close is COUNTER to the other kind?


Day-to-day discovery of preprint–publication links

Authors : Guillaume Cabanac, Theodora Oikonomidi, Isabelle Boutron

Preprints promote the open and fast communication of non-peer reviewed work. Once a preprint is published in a peer-reviewed venue, the preprint server updates its web page: a prominent hyperlink leading to the newly published work is added.

Linking preprints to publications is of utmost importance as it provides readers with the latest version of a now certified work. Yet leading preprint servers fail to identify all existing preprint–publication links.

This limitation calls for a more thorough approach to this critical information retrieval task: overlooking published evidence translates into partial and even inaccurate systematic reviews on health-related issues, for instance.

We designed an algorithm leveraging the Crossref public and free source of bibliographic metadata to comb the literature for preprint–publication links. We tested it on a reference preprint set identified and curated for a living systematic review on interventions for preventing and treating COVID-19 performed by international collaboration: the COVID-NMA initiative (

The reference set comprised 343 preprints, 121 of which appeared as a publication in a peer-reviewed journal. While the preprint servers identified 39.7% of the preprint–publication links, our linker identified 90.9% of the expected links with no clues taken from the preprint servers.

The accuracy of the proposed linker is 91.5% on this reference set, with 90.9% sensitivity and 91.9% specificity. This is a 16.26% increase in accuracy compared to that of preprint servers. We release this software as supplementary material to foster its integration into preprint servers’ workflows and enhance a daily preprint–publication chase that is useful to all readers, including systematic reviewers.

This preprint–publication linker currently provides day-to-day updates to the biomedical experts of the COVID-NMA initiative.

URL : Day-to-day discovery of preprint–publication links


Crédibilité du chercheur, relation de confiance et éthique en recherche qualitative : l’implexité à la croisée des chemins

Auteur/Author : Bakary Doucouré

Cet article, élaboré à partir d’observations et d’expériences accumulées durant une quinzaine d’années dans le cadre de plusieurs recherches empiriques, analyse l’intérêt de la crédibilité du chercheur et de la relation de confiance dans un processus de recherche qualitative.

Il relève d’une démarche autoréflexive et introspective faisant émerger la place de l’implexité et le rôle essentiel de l’éthique dans le processus d’enquête qualitative, tout en permettant de mieux comprendre le lien entre les deux.

Aussi cet article s’inscrit de manière plus large dans les réflexions épistémologiques, méthodologiques et éthiques portant sur la recherche qualitative qui constituent des préoccupations à la fois constantes, évolutives et sans cesse renouvelées. Il est structuré autour de deux principaux axes.

D’une part, il aborde un ensemble de questions portant à la fois sur la crédibilité, la confiance, l’éthique et l’implexité, tout en indiquant des perspectives analytiques qui s’en dégagent.

D’autre part, à partir de l’analyse des données, l’article montre la relation entre l’implexité, la crédibilité et la confiance, mais aussi la dynamique de renforcement mutuel entre l’éthique et l’implexité.


La recherche transdisciplinaire au sein des institutions d’enseignement supérieur et de recherche

Auteurs/Authors : Julie Hermesse, Audrey Vankeerberghen

Ce texte présente les principales questions et réflexions qui ont traversé la journée d’étude « La recherche transdisciplinaire au sein des universités » organisée le 14 septembre 2018 à l’Université libre de Bruxelles.

La transdisciplinarité contribue aux transformations sociétales par la production d’un savoir hybride, à la fois scientifique et socialement pertinent. Si cela correspond aux missions de recherche et de service à la société prônées par les institutions académiques, force est de constater que la recherche transdisciplinaire ne s’y installe pas sans tension et ne bénéficie pas encore d’une légitimité reconnue.

Partant de ce constat, cette journée fut l’occasion de mettre en dialogue différents acteurs concernés par la transdisciplinarité afin de réfléchir aux enjeux propres à la mise en œuvre de ce mode de recherche dans les institutions académiques : financement, légitimité des connaissances produites ou encore modes de validation des carrières scientifiques.

Ces réflexions ont généré des débats, pointé des obstacles mais également actionné des leviers pour permettre la transition nécessaire au déploiement de la transdisciplinarité.

URL : La recherche transdisciplinaire au sein des institutions d’enseignement supérieur et de recherche