Curated Archiving of Research Software Artifacts: Lessons Learned from the French Open Archive (HAL)

Authors : Roberto di Cosmo, Morane Gruenpeter, Bruno Marmol, Alain Monteil, Laurent Romary, Jozefina Sadowsa

Software has become an indissociable support of technical and scientific knowledge. The preservation of this universal body of knowledge is as essential as preserving research articles and data sets.

In the quest to make scientific results reproducible, and pass knowledge to future generations, we must preserve these three main pillars: research articles that describe the results, the data sets used or produced, and the software that embodies the logic of the data transformation.

The collaboration between Software Heritage (SWH), the Center for Direct Scientific Communication (CCSD) and the scientific and technical information services (IES) of The French Institute for Research in Computer Science and Automation (Inria) has resulted in a specified moderation and curation workflow for research software artifacts deposited in the HAL the French global open access repository.

The curation workflow was developed to help digital librarians and archivists handle this new and peculiar artifact – software source code. While implementing the workflow, a set of guidelines has emerged from the challenges and the solutions put in place to help all actors involved in the process.

URL : Curated Archiving of Research Software Artifacts: Lessons Learned from the French Open Archive (HAL)

DOI : https://doi.org/10.2218/ijdc.v15i1.698

The Heritage Data Reuse Charter: from principles to research workflows

Authors : Erzsébet Tóth-Czifra, Laurent Romary

There is a growing need to establish domain-or discipline-specific approaches to research data sharing workflows. A defining feature of data and data workflows in the arts and humanities domain is their dependence on cultural heritage sources hosted and curated in museums, libraries, galleries and archives.

A major difficulty when scholars interact with heritage data is that the nature of the cooperation between researchers and Cultural Heritage Institutions (henceforth CHIs) is often constrained by structural and legal challenges but even more by uncertainties as to the expectations of both parties.

The Heritage Data Reuse Charter aims to address these by designing a common environment that will enable all the relevant actors to work together to connect and improve access to heritage data and make transactions related to the scholarly use of cultural heritage data more visible and transparent.

As a first step, a wide range of stakeholders on the Cultural Heritage and research sector agreed upon a set of generic principles, summarized in the Mission Statement of the Charter, that can serve as a baseline governing the interactions between CHIs, researchers and data centres.

This was followed by a long and thorough validation process related to these principles through surveys 1 and workshops 2. As a second step, we now put forward a questionnaire template tool that helps researchers and CHIs to translate the 6 core principles into specific research project settings.

It contains questions about access to data, provenance information, preferred citation standards, hosting responsibilities etc. on the basis of which the parties can arrive at mutual reuse agreements that could serve as a starting point for a FAIR-by-construction data management, right from the project planning/application phase.

The questionnaire template and the resulting mutual agreements can be flexibly applied to projects of different scale and in platform-independent ways. Institutions can embed them into their own exchange protocols while researchers can add them to their Data Management Plans.

As such, they can show evidence for responsible and fair conduct of cultural heritage data, and fair (but also FAIR) research data management practices that are based on partnership with the holding institution.

URL : https://halshs.archives-ouvertes.fr/halshs-02475692

Leveraging Concepts in Open Access Publications

Authors : Andrea Bertino, Luca Foppiano, Laurent Romary, Pierre Mounier

Aim

This paper addresses the integration of a Named Entity Recognition and Disambiguation (NERD) service within a group of open access (OA) publishing digital platforms and considers its potential impact on both research and scholarly publishing.

This application, called entity-fishing, was initially developed by Inria in the context of the EU FP7 project CENDARI (Lopez et al., 2014) and provides automatic entity recognition and disambiguation against Wikipedia and Wikidata. Distributed with an open-source licence, it was deployed as a web service in the DARIAH infrastructure hosted at the French HumaNum.

Methods

In this paper, we focus on the specific issues related to its integration on five OA platforms specialized in the publication of scholarly monographs in social sciences and humanities as part of the work carried out within the EU H2020 project HIRMEOS (High Integration of Research Monographs in the European Open Science infrastructure).

Results and Discussion

In the following sections, we give a brief overview of the current status and evolution of OA publications and how HIRMEOS aims to contribute to this.

We then give a comprehensive description of the entity-fishing service, focusing on its concrete applications in real use cases together with some further possible ideas on how to exploit the generated annotations.

Conclusions

We show that entity-fishing annotations can improve both research and publishing process. Entity-fishing annotations can be used to achieve a better and quicker understanding of the specific and disciplinary language of certain monographs and so encourage non-specialists to use them.

In addition, a systematic implementation of the entity-fishing service can be used by publishers to generate thematic indexes within book collections to allow better cross-linking and query functions.

URL : https://hal.inria.fr/hal-01900303/