Données ouvertes liées et recherche historique : un changement de paradigme

Auteur/Author : Francesco Beretta

Dans le contexte de la transition numérique, le Web sémantique et les données ouvertes liées (linked open data [LOD], en anglais) jouent un rôle de plus en plus central, car ils permettent de construire des « graphes d’information » (knowledge graphs, en anglais) reliant l’ensemble des ressources du Web.

Ce phénomène interroge les sciences historiques et soulève la question d’un changement de paradigme. Après avoir précisé ce qu’il faut entendre par « données », l’article analyse la place qu’elles occupent dans le processus de production du savoir.

Il présente les principales composantes du changement de paradigme, en particulier le potentiel des LOD et d’une sémantique robuste en tant que véhicules d’une information factuelle de qualité, intelligible et réutilisable. S’ensuit une présentation des projets d’infrastructure réalisés au sein du Laboratoire de recherche historique Rhône-Alpes (Larhra) : symogih.org, ontome.net, geovistory.org.

Leur but est de faciliter la transition numérique grâce à un outillage construit en cohérence avec l’épistémologie des sciences historiques et de contribuer à la réalisation d’un « graphe d’information » disciplinaire.

URL : Données ouvertes liées et recherche historique : un changement de paradigme

DOI : https://doi.org/10.4000/revuehn.3349

Semantic micro-contributions with decentralized nanopublication services

Authors : Tobias Kuhn, Ruben Taelman, Vincent Emonet, Haris Antonatos, Stian Soiland-Reyes, Michel Dumontier

While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases.

Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish.

To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications.

The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments.

We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing.

URL : Semantic micro-contributions with decentralized nanopublication services

DOI : https://doi.org/10.7717/peerj-cs.387

OpenCitations, an infrastructure organization for open scholarship

Authors : Silvio Peroni, David Shotton

OpenCitations is an infrastructure organization for open scholarship dedicated to the publication of open citation data as Linked Open Data using Semantic Web technologies, thereby providing a disruptive alternative to traditional proprietary citation indexes.

Open citation data are valuable for bibliometric analysis, increasing the reproducibility of large-scale analyses by enabling publication of the source data. Following brief introductions to the development and benefits of open scholarship and to Semantic Web technologies, this paper describes OpenCitations and its data sets, tools, services, and activities.

These include the OpenCitations Data Model; the SPAR (Semantic Publishing and Referencing) Ontologies; OpenCitations’ open software of generic applicability for searching, browsing, and providing REST APIs over resource description framework (RDF) triplestores; Open Citation Identifiers (OCIs) and the OpenCitations OCI Resolution Service; the OpenCitations Corpus (OCC), a database of open downloadable bibliographic and citation data made available in RDF under a Creative Commons public domain dedication; and the OpenCitations Indexes of open citation data, of which the first and largest is COCI, the OpenCitations Index of Crossref Open DOI-to-DOI Citations, which currently contains over 624 million bibliographic citations and is receiving considerable usage by the scholarly community.

URL : OpenCitations, an infrastructure organization for open scholarship

DOI : https://doi.org/10.1162/qss_a_00023

Crafting Linked Open Data to Enhance the Discoverability of Institutional Repositories on the Web

Authors : Qiang Jin, Jane Sandberg

Institutional repositories are archives for collecting and disseminating digital copies of the intellectual output of institutions. Linked open data is to expose and connect pieces of data, information, and knowledge on the Semantic Web.

This paper studies how BIBFRAME 2.0 can be used to describe objects in institutional repositories, with the goal of bringing together efforts within two communities devoted to openness.

We examine a sample of mappings and conversions from Dublin Core to BIBRAME 2.0 ontology to see if BIBFRAME 2.0 will increase visibility of local digital collections on the Web.

URL : http://qqml-journal.net/index.php/qqml/article/view/505

Evaluation of a novel cloud-based software platform for structured experiment design and linked data analytics

Authors : Hannes Juergens, Matthijs Niemeijer, Laura D. Jennings-Antipov, Robert Mans, Jack More, Antonius J. A. van Maris, Jack T. Pronk, Timothy S. Gardner

Open data in science requires precise definition of experimental procedures used in data generation, but traditional practices for sharing protocols and data cannot provide the required data contextualization.

Here, we explore implementation, in an academic research setting, of a novel cloud-based software system designed to address this challenge. The software supports systematic definition of experimental procedures as visual processes, acquisition and analysis of primary data, and linking of data and procedures in machine-computable form.

The software was tested on a set of quantitative microbial-physiology experiments. Though time-intensive, definition of experimental procedures in the software enabled much more precise, unambiguous definitions of experiments than conventional protocols.

Once defined, processes were easily reusable and composable into more complex experimental flows. Automatic coupling of process definitions to experimental data enables immediate identification of correlations between procedural details, intended and unintended experimental perturbations, and experimental outcomes.

Software-based experiment descriptions could ultimately replace terse and ambiguous ‘Materials and Methods’ sections in scientific journals, thus promoting reproducibility and reusability of published studies.

URL : Evaluation of a novel cloud-based software platform for structured experiment design and linked data analytics

DOI : https://doi.org/10.1038/sdata.2018.195

Quels choix juridiques pour la médiation culturelle et scientifique dans l’environnement numérique ?

Auteur/Author : Lionel Maurel

La dimension juridique n’est pas forcément celle à laquelle on songe en premier lorsque l’on envisage les «enjeux numériques pour la médiation scientifique et culturelle du passé».

Pourtant, tout autant que la technique, le droit est devenu aujourd’hui un facteur essentiel d’interopérabilité dans l’environnement numérique. Tout projet culturel ou scientifique produisant des données et/ou des contenus doit s’interroger sur les conditions juridiques de mise à disposition de ces objets, sous peine que ces questions ne se posent ensuite a posteriori, en provoquant alors souvent difficultés et blocages pour ne pas avoir été suffisamment anticipées.

Cette dimension juridique est néanmoins de plus en plus importante pour les institutions culturelles (archives, bibliothèques, musées, etc.), ainsi que pour les équipes de chercheurs à mesure que la démarche du Linked Open Data (LOD) se développe et place les porteurs de projets devant des choix souvent complexes à effectuer.

L’ouverture des données implique en effet d’être en mesure de choisir entre plusieurs licences parmi le panel d’outils contractuels existants pour les appliquer à différents objets, sachant que leurs effets varient sensiblement et ne sont pas neutres pour les réutilisateurs en aval.

La visibilité des projets, leur capacité à nouer des relations avec d’autres initiatives et les formes même de médiation qui pourront être mis en oeuvre auprès de différents publics découlent en partie des décisions qui auront été prises à propos des conditions d’utilisation des données et contenus.

Le présent article vise à décrire les principes de base à partir desquels ces choix peuvent être effectués dans de bonnes conditions. En particulier, cet article s’attachera à montrer que faire le choix de l’ouverture par le biais de licences adaptées constitue un atout pour le développement de la médiation autour des données de la recherche.

URL : https://hal.archives-ouvertes.fr/hal-01577998/

 

Semantic representation and enrichment of information retrieval experimental data

Authors : Gianmaria Silvello, Georgeta Bordea, Nicola Ferro, Paul Buitelaar, Toine Bogers

Experimental evaluation carried out in international large-scale campaigns is a fundamental pillar of the scientific and technological advancement of information retrieval (IR) systems.

Such evaluation activities produce a large quantity of scientific and experimental data, which are the foundation for all the subsequent scientific production and development of new systems.

In this work, we discuss how to semantically annotate and interlink this data, with the goal of enhancing their interpretation, sharing, and reuse. We discuss the underlying evaluation workflow and propose a resource description framework model for those workflow parts.

We use expertise retrieval as a case study to demonstrate the benefits of our semantic representation approach. We employ this model as a means for exposing experimental data as linked open data (LOD) on the Web and as a basis for enriching and automatically connecting this data with expertise topics and expert profiles.

In this context, a topic-centric approach for expert search is proposed, addressing the extraction of expertise topics, their semantic grounding with the LOD cloud, and their connection to IR experimental data.

Several methods for expert profiling and expert finding are analysed and evaluated. Our results show that it is possible to construct expert profiles starting from automatically extracted expertise topics and that topic-centric approaches outperform state-of-the-art language modelling approaches for expert finding.

URL : https://aran.library.nuigalway.ie/handle/10379/5862