(Semi)automated disambiguation of scholarly repositories

Authors  : Miriam Baglioni, Andrea Mannocci, Gina Pavone, Michele De Bonis, Paolo Manghi

The full exploitation of scholarly repositories is pivotal in modern Open Science, and scholarly repository registries are kingpins in enabling researchers and research infrastructures to list and search for suitable repositories. However, since multiple registries exist, repository managers are keen on registering multiple times the repositories they manage to maximise their traction and visibility across different research communities, disciplines, and applications.

These multiple registrations ultimately lead to information fragmentation and redundancy on the one hand and, on the other, force registries’ users to juggle multiple registries, profiles and identifiers describing the same repository. Such problems are known to registries, which claim equivalence between repository profiles whenever possible by cross-referencing their identifiers across different registries.

However, as we will see, this “claim set” is far from complete and, therefore, many replicas slip under the radar, possibly creating problems downstream.

In this work, we combine such claims to create duplicate sets and extend them with the results of an automated clustering algorithm run over repository metadata descriptions. Then we manually validate our results to produce an “as accurate as possible” de-duplicated dataset of scholarly repositories.

URL : https://arxiv.org/abs/2307.02647

“Knock knock! Who’s there?” A study on scholarly repositories’ availability

Authors : Andrea Mannocci, Miriam Baglioni, Paolo Manghi

Scholarly repositories are the cornerstone of modern open science, and their availability is vital for enacting its practices. To this end, scholarly registries such as FAIRsharing, re3data, OpenDOAR and ROAR give them presence and visibility across different research communities, disciplines, and applications by assigning an identifier and persisting their profiles with summary metadata.

Alas, like any other resource available on the Web, scholarly repositories, be they tailored for literature, software or data, are quite dynamic and can be frequently changed, moved, merged or discontinued.

Therefore, their references are prone to link rot over time, and their availability often boils down to whether the homepage URLs indicated in authoritative repository profiles within scholarly registries respond or not.

For this study, we harvested the content of four prominent scholarly registries and resolved over 13 thousand unique repository URLs. By performing a quantitative analysis on such an extensive collection of repositories, this paper aims to provide a global snapshot of their availability, which bewilderingly is far from granted.

URL : https://arxiv.org/abs/2207.12879

We Can Make a Better Use of ORCID: Five Observed Misapplications

Authors : Miriam Baglioni, Paolo Manghi, Andrea Mannocci, Alessia Bardi

Since 2012, the “Open Researcher and Contributor ID” organisation (ORCID) has been successfully running a worldwide registry, with the aim of “providing a unique, persistent identifier for individuals to use as they engage in research, scholarship, and innovation activities”.

Any service in the scholarly communication ecosystem (e.g., publishers, repositories, CRIS systems, etc.) can contribute to a non-ambiguous scholarly record by including, during metadata deposition, referrals to iDs in the ORCID registry.

The OpenAIRE Research Graph is a scholarly knowledge graph that aggregates both records from the ORCID registry and publication records with ORCID referrals from publishers and repositories worldwide to yield research impact monitoring and Open Science statistics.

Graph data analytics revealed “anomalies” due to ORCID registry “misapplications”, caused by wrong ORCID referrals and misexploitation of the ORCID registry. Albeit these affect just a minority of ORCID records, they inevitably affect the quality of the ORCID infrastructure and may fuel the rise of detractors and scepticism about the service.

In this paper, we classify and qualitatively document such misapplications, identifying five ORCID registrant-related and ORCID referral-related anomalies to raise awareness among ORCID users.

We describe the current countermeasures taken by ORCID and, where applicable, provide recommendations. Finally, we elaborate on the importance of a community-steered Open Science infrastructure and the benefits this approach has brought and may bring to ORCID.

URL : We Can Make a Better Use of ORCID: Five Observed Misapplications

DOI : http://doi.org/10.5334/dsj-2021-038