InfoDoc MicroVeille – Page 114 – Veille dédiée aux Sciences de l'Information et des Bibliothèques // Collecting and Sharing research papers in Library and Information science ISSN 2429-3938

Transparency, provenance and collections as data: the National Library of Scotland’s Data Foundry

Posted on 3 juillet 2021 by Hans Dillaerts

Author : Sarah Ames

‘Collections as data’ has become a core activity for libraries in recent years: it is important that we make collections available in machine-readable formats to enable and encourage computational research. However, while this is a necessary output, discussion around the processes and workflows required to turn collections into data, and to make collections data available openly, are just as valuable.

With libraries increasingly becoming producers of their own collections – presenting data from digitisation and digital production tools as part of datasets, for example – and making collections available at scale through mass-digitisation programmes, the trustworthiness of our processes comes into question.

In a world of big data, often of unclear origins, how can libraries be transparent about the ways in which collections are turned into data, how do we ensure that biases in our collections are recognised and not amplified, and how do we make these datasets available openly for reuse?

This paper presents a case study of work underway at the National Library of Scotland to present collections as data in an open and transparent way – from establishing a new Digital Scholarship Service, to workflows and online presentation of datasets.

It considers the changes to existing processes needed to produce the Data Foundry, the National Library of Scotland’s open data delivery platform, and explores the practical challenges of presenting collections as data online in an open, transparent and coherent manner.

URL : Transparency, provenance and collections as data: the National Library of Scotland’s Data Foundry

Original location : https://www.liberquarterly.eu/article/10.18352/lq.10371/

Modes d’évaluation ouverte par les pairs : de la revue à la plateforme

Posted on 22 juin 2021 by Hans Dillaerts

Auteurs/Authors : Evelyne Broudoux, Madjid Ihadjadene

Cet article a pour but de proposer un état de l’art des différentes formes de l’évaluation d’articles ou de communications par les pairs. De l’évaluation « aveugle» à l’évaluation « ouverte », de multiples possibilités existent et sont expérimentées.

C’est dans le champ des sciences que l’on trouve le plus d’innovations sociotechniques s’appuyant sur des plateformes de publication modélisant des workflows éditoriaux originaux.

L’ouverture de l’évaluation peut se produire entre pairs, en rendant publiques les identités et/ou les rapports des évaluateurs, à différents stades de l’article scientifique : préprint, en cours de rédaction, ou encore après publication.

Cet état de l’art est basé sur un ensemble de publications essentiellement produites par les acteurs de l’évaluation ouverte, issus principalement des disciplines STM.

URL : Modes d’évaluation ouverte par les pairs : de la revue à la plateforme

URL : https://revue-cossi.numerev.com/articles/revue-9/2496-modes-d-evaluation-ouverte-par-les-pairs-de-la-revue-a-la-plateforme

Prevalence of nonsensical algorithmically generated papers in the scientific literature

Posted on 20 juin 2021 by Hans Dillaerts

Authors : Guillaume Cabanac, Cyril Labbé

In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow-up retractions.

No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2-fold.

First, we designed a detector that combs the scientific literature for grammar-based computer-generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen-papers from 19 publishers.

We estimate the prevalence of SCIgen-papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34).

Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references.

It stresses the need to screen papers for nonsense before peer-review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.

URL : Prevalence of nonsensical algorithmically generated papers in the scientific literature

DOI : https://doi.org/10.1002/asi.24495

Digital Object Identifier (DOI) Under the Context of Research Data Librarianship

Posted on 19 juin 2021 by Hans Dillaerts

Author : Jia Liu

A digital object identifier (DOI) is an increasingly prominent persistent identifier in finding and accessing scholarly information. This paper intends to present an overview of global development and approaches in the field of DOI and DOI services with a slight geographical focus on Germany.

At first, the initiation and components of the DOI system and the structure of a DOI name are explored. Next, the fundamental and specific characteristics of DOIs are described and DOIs for three (3) kinds of typical intellectual entities in the scholar communication are dealt with; then, a general DOI service pyramid is sketched with brief descriptions of functions of institutions at different levels.

After that, approaches of the research data librarianship community in the field of RDM, especially DOI services, are elaborated. As examples, the DOI services provided in German research libraries as well as best practices of DOI services in a German library are introduced; and finally, the current practices and some issues dealing with DOIs are summarized. It is foreseeable that DOI, which is crucial to FAIR research data, will gain extensive recognition in the scientific world.

URL : Digital Object Identifier (DOI) Under the Context of Research Data Librarianship

DOI : https://doi.org/10.7191/jeslib.2021.1180

Open access book usage data – how close is COUNTER to the other kind?

Posted on 18 juin 2021 by Hans Dillaerts

Author : Ronald Snijder

In April 2020, the OAPEN Library moved to a new platform, based on DSpace 6. During the same period, IRUS-UK started working on the deployment of Release 5 of the COUNTER Code of Practice (R5). This is, therefore, a good moment to compare two widely used usage metrics – R5 and Google Analytics (GA).

This article discusses the download data of close to 11,000 books and chapters from the OAPEN Library, from the period 15 April 2020 to 31 July 2020. When a book or chapter is downloaded, it is logged by GA and at the same time a signal is sent to IRUS-UK.

This results in two datasets: the monthly downloads measured in GA and the usage reported by R5, also clustered by month. The number of downloads reported by GA is considerably larger than R5. The total number of downloads in GA for the period is over 3.6 million.

In contrast, the amount reported by R5 is 1.5 million, around 400,000 downloads per month. Contrasting R5 and GA data on a country-by-country basis shows significant differences. GA lists more than five times the number of downloads for several countries, although the totals for other countries are about the same.

When looking at individual tiles, of the 500 highest ranked titles in GA that are also part of the 1,000 highest ranked titles in R5, only 6% of the titles are relatively close together. The choice of metric service has considerable consequences on what is reported.

Thus, drawing conclusions about the results should be done with care. One metric is not better than the other, but we should be open about the choices made. After all, open access book metrics are complicated, and we can only benefit from clarity.

URL : Open access book usage data – how close is COUNTER to the other kind?

DOI : http://doi.org/10.1629/uksg.539

Affiliation Information in DataCite Dataset Metadata: a Flemish Case Study

Posted on 17 juin 2021 by Hans Dillaerts

Author/Auteur : Niek Van Wettere

This article aims to evaluate how and to what extent metadata of datasets indexed in DataCite offer clear human- or machine-readable information that enables the research data to be linked to a particular research institution.

Two main pathways are explored. First, researchers can encode their affiliation information at the moment of data submission. This can be done by means of free-text metadata fields or via the inclusion of identifiers such as GRID/ROR and ORCID. Second, affiliation information can be traced indirectly through linking between a dataset and associated publications, given that the metadata of publications is often more explicit about affiliation information than the metadata of datasets.

Both pathways of affiliation information encoding are evaluated on the basis of metadata pertaining to datasets created at the five Flemish universities. It is shown that good practices such as encoding of affiliation information in a dedicated metadata field or inclusion of ORCID in the metadata are on the rise, but could be expanded further.

Finally, the establishment of links between datasets and related publications is often lacking in dataset metadata, although there are important differences between data repositories, as is also demonstrated in a more data-intensive follow-up analysis based on random samples of metadata records.

It is important that data repositories address this issue by providing a metadata field clearly dedicated to associated publications, prominently displayed on the landing page of the dataset.

URL : Affiliation Information in DataCite Dataset Metadata: a Flemish Case Study

DOI : http://doi.org/10.5334/dsj-2021-013

Day-to-day discovery of preprint–publication links

Posted on 10 juin 2021 by Hans Dillaerts

Authors : Guillaume Cabanac, Theodora Oikonomidi, Isabelle Boutron

Preprints promote the open and fast communication of non-peer reviewed work. Once a preprint is published in a peer-reviewed venue, the preprint server updates its web page: a prominent hyperlink leading to the newly published work is added.

Linking preprints to publications is of utmost importance as it provides readers with the latest version of a now certified work. Yet leading preprint servers fail to identify all existing preprint–publication links.

This limitation calls for a more thorough approach to this critical information retrieval task: overlooking published evidence translates into partial and even inaccurate systematic reviews on health-related issues, for instance.

We designed an algorithm leveraging the Crossref public and free source of bibliographic metadata to comb the literature for preprint–publication links. We tested it on a reference preprint set identified and curated for a living systematic review on interventions for preventing and treating COVID-19 performed by international collaboration: the COVID-NMA initiative (covid-nma.com).

The reference set comprised 343 preprints, 121 of which appeared as a publication in a peer-reviewed journal. While the preprint servers identified 39.7% of the preprint–publication links, our linker identified 90.9% of the expected links with no clues taken from the preprint servers.

The accuracy of the proposed linker is 91.5% on this reference set, with 90.9% sensitivity and 91.9% specificity. This is a 16.26% increase in accuracy compared to that of preprint servers. We release this software as supplementary material to foster its integration into preprint servers’ workflows and enhance a daily preprint–publication chase that is useful to all readers, including systematic reviewers.

This preprint–publication linker currently provides day-to-day updates to the biomedical experts of the COVID-NMA initiative.

URL : Day-to-day discovery of preprint–publication links

DOI : https://doi.org/10.1007/s11192-021-03900-7