(Semi)automated disambiguation of scholarly repositories

Authors  : Miriam Baglioni, Andrea Mannocci, Gina Pavone, Michele De Bonis, Paolo Manghi

The full exploitation of scholarly repositories is pivotal in modern Open Science, and scholarly repository registries are kingpins in enabling researchers and research infrastructures to list and search for suitable repositories. However, since multiple registries exist, repository managers are keen on registering multiple times the repositories they manage to maximise their traction and visibility across different research communities, disciplines, and applications.

These multiple registrations ultimately lead to information fragmentation and redundancy on the one hand and, on the other, force registries’ users to juggle multiple registries, profiles and identifiers describing the same repository. Such problems are known to registries, which claim equivalence between repository profiles whenever possible by cross-referencing their identifiers across different registries.

However, as we will see, this “claim set” is far from complete and, therefore, many replicas slip under the radar, possibly creating problems downstream.

In this work, we combine such claims to create duplicate sets and extend them with the results of an automated clustering algorithm run over repository metadata descriptions. Then we manually validate our results to produce an “as accurate as possible” de-duplicated dataset of scholarly repositories.

URL : https://arxiv.org/abs/2307.02647

“Knock knock! Who’s there?” A study on scholarly repositories’ availability

Authors : Andrea Mannocci, Miriam Baglioni, Paolo Manghi

Scholarly repositories are the cornerstone of modern open science, and their availability is vital for enacting its practices. To this end, scholarly registries such as FAIRsharing, re3data, OpenDOAR and ROAR give them presence and visibility across different research communities, disciplines, and applications by assigning an identifier and persisting their profiles with summary metadata.

Alas, like any other resource available on the Web, scholarly repositories, be they tailored for literature, software or data, are quite dynamic and can be frequently changed, moved, merged or discontinued.

Therefore, their references are prone to link rot over time, and their availability often boils down to whether the homepage URLs indicated in authoritative repository profiles within scholarly registries respond or not.

For this study, we harvested the content of four prominent scholarly registries and resolved over 13 thousand unique repository URLs. By performing a quantitative analysis on such an extensive collection of repositories, this paper aims to provide a global snapshot of their availability, which bewilderingly is far from granted.

URL : https://arxiv.org/abs/2207.12879

We Can Make a Better Use of ORCID: Five Observed Misapplications

Authors : Miriam Baglioni, Paolo Manghi, Andrea Mannocci, Alessia Bardi

Since 2012, the “Open Researcher and Contributor ID” organisation (ORCID) has been successfully running a worldwide registry, with the aim of “providing a unique, persistent identifier for individuals to use as they engage in research, scholarship, and innovation activities”.

Any service in the scholarly communication ecosystem (e.g., publishers, repositories, CRIS systems, etc.) can contribute to a non-ambiguous scholarly record by including, during metadata deposition, referrals to iDs in the ORCID registry.

The OpenAIRE Research Graph is a scholarly knowledge graph that aggregates both records from the ORCID registry and publication records with ORCID referrals from publishers and repositories worldwide to yield research impact monitoring and Open Science statistics.

Graph data analytics revealed “anomalies” due to ORCID registry “misapplications”, caused by wrong ORCID referrals and misexploitation of the ORCID registry. Albeit these affect just a minority of ORCID records, they inevitably affect the quality of the ORCID infrastructure and may fuel the rise of detractors and scepticism about the service.

In this paper, we classify and qualitatively document such misapplications, identifying five ORCID registrant-related and ORCID referral-related anomalies to raise awareness among ORCID users.

We describe the current countermeasures taken by ORCID and, where applicable, provide recommendations. Finally, we elaborate on the importance of a community-steered Open Science infrastructure and the benefits this approach has brought and may bring to ORCID.

URL : We Can Make a Better Use of ORCID: Five Observed Misapplications

DOI : http://doi.org/10.5334/dsj-2021-038

A tale of two ‘opens’: intersections between Free and Open Source Software and Open Scholarship

Authors : Jonathan Tennant, Ritwik Agarwal, Ksenija Baždarić, David Brassard, Tom Crick, Daniel Dunleavy, Thomas Evans, Nicholas Gardner, Monica Gonzalez-Marquez, Daniel Graziotin, Bastian Greshake Tzovaras, Daniel Gunnarsson, Johanna Havemann, Mohammad Hosseini, Daniel Katz, Marcel Knöchelmann, Christopher Madan, Paolo Manghi, Alberto Marocchino, Paola Masuzzo, Peter Murray-Rust, Sanjay Narayanaswamy, Gustav Nilsonne, Josmel Pacheco-Mendoza, Bart Penders, Olivier Pourret, Michael Rera, John Samuel, Tobias Steiner, Jadranka Stojanovski, Alejandro Uribe-Tirado, Rutger Vos, Simon Worthington, Tal Yarkoni

There is no clear-cut boundary between Free and Open Source Software and Open Scholarship, and the histories, practices, and fundamental principles between the two remain complex.

In this study, we critically appraise the intersections and differences between the two movements. Based on our thematic comparison here, we conclude several key things.

First, there is substantial scope for new communities of practice to form within scholarly communities that place sharing and collaboration/open participation at their focus.

Second, Both the principles and practices of FOSS can be more deeply ingrained within scholarship, asserting a balance between pragmatism and social ideology.

Third, at the present, Open Scholarship risks being subverted and compromised by commercial players.

Fourth, the shift and acceleration towards a system of Open Scholarship will be greatly enhanced by a concurrent shift in recognising a broader range of practices and outputs beyond traditional peer review and research articles.

In order to achieve this, we propose the formulation of a new type of institutional mandate. We believe that there is substantial need for research funders to invest in sustainable open scholarly infrastructure, and the communities that support them, to avoid the capture and enclosure of key research services that would prevent optimal researcher behaviours.

Such a shift could ultimately lead to a healthier scientific culture, and a system where competition is replaced by collaboration, resources (including time and people) are shared and acknowledged more efficiently, and the research becomes inherently more rigorous, verified, and reproducible.

URL : A tale of two ‘opens’: intersections between Free and Open Source Software and Open Scholarship

DOI : https://doi.org/10.31235/osf.io/2kxq8