Interoperability and FAIRness through a novel combination of Web technologies

Authors : Mark D. Wilkinson, Ruben Verborgh, Luiz Olavo Bonino da Silva Santos, Tim Clark, Morris A. Swertz, Fleur D.L. Kelpin, Alasdair J.G. Gray, Erik A. Schultes, Erik M. van Mulligen, Paolo Ciccarese, Arnold Kuzniar, Anand Gavai, Mark Thompson, Rajaram Kaliyaperumal, Jerven T. Bolleman, Michel Dumontier

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT).

These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not.

The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability.

Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings.

We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles.

The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

URL : Interoperability and FAIRness through a novel combination of Web technologies


Building a Disciplinary, World‐Wide Data Infrastructure

Authors: Françoise Genova, Christophe Arviset, Bridget M. Almas, Laura Bartolo, Daan Broeder, Emily Law, Brian McMahon

Sharing scientific data with the objective of making it discoverable, accessible, reusable, and interoperable requires work and presents challenges being faced at the disciplinary level to define in particular how the data should be formatted and described.

This paper represents the Proceedings of a session held at SciDataCon 2016 (Denver, 12–13 September 2016). It explores the way a range of disciplines, namely materials science, crystallography, astronomy, earth sciences, humanities and linguistics, get organized at the international level to address those challenges. T

he disciplinary culture with respect to data sharing, science drivers, organization, lessons learnt and the elements of the data infrastructure which are or could be shared with others are briefly described. Commonalities and differences are assessed.

Common key elements for success are identified: data sharing should be science driven; defining the disciplinary part of the interdisciplinary standards is mandatory but challenging; sharing of applications should accompany data sharing. Incentives such as journal and funding agency requirements are also similar.

For all, social aspects are more challenging than technological ones. Governance is more diverse, often specific to the discipline organization. Being problem‐driven is also a key factor of success for building bridges to enable interdisciplinary research.

Several international data organizations such as CODATA, RDA and WDS can facilitate the establishment of disciplinary interoperability frameworks. As a spin‐off of the session, a RDA Disciplinary Interoperability Interest Group is proposed to bring together representatives across disciplines to better organize and drive the discussion for prioritizing, harmonizing and efficiently articulating disciplinary needs.

URL : Building a Disciplinary, World‐Wide Data Infrastructure



Scientific data from and for the citizen

Authors : Sven Schade, Chrisa Tsinaraki, Elena Roglia

Powered by advances of technology, today’s Citizen Science projects cover a wide range of thematic areas and are carried out from local to global levels. This wealth of activities creates an abundance of data, for example, in the forms of observations submitted by mobile phones; readings of low-cost sensors; or more general information about peoples’ activities.

The management and possible sharing of this data has become a research topic in its own right. We conducted a survey in the summer of 2015 in order to collectively analyze the state of play in Citizen Science.

This paper summarizes our main findings related to data access, standardization and data preservation. We provide examples of good practices in each of these areas and outline actions to address identified challenges.


Interopérabilité et logiques organisationnelles. Ce qu’ouvrir ses données veut dire

Auteurs/Authors : Marie Després-Lonnet, Béatrice Micheau, Marie Destandau

Dans la perspective de l’ouverture des données publiques, nous accompagnons trois institutions qui gèrent des fonds liés à la musique, dans cette triple évolution technique, organisationnelle et politique.

L’objectif est de concevoir une « ontologie » qui servira d’appui à la description de la musique. Notre collaboration avec les experts a permis de saisir les tensions que ce projet génère, malgré la volonté collective de parvenir à une modélisation partagée.

Nous avons ainsi pu montrer que chaque institution porte un regard situé sur la musique comme pratique sociale et sur les objets et documents qu’elle détient. La recherche d’un modèle commun et qui pourrait s’appliquer globalement nécessite que chaque institution envisage les données et les concepts associés d’une façon plus globale et remette en partie en question ses modes de faire.

Notre étude montre que pour ne pas aboutir à un modèle totalement abstrait, il convient de voir la modélisation comme une forme de discours qui s’inscrit dans la continuité des écritures de notre patrimoine culturel : écritures vivantes, faites de négociations constantes entre normes et bricolages, nécessités organisationnelles et adaptation à des contraintes ponctuelles, dont nous retrouvons sans cesse les multiples traces, qui sont autant de matériaux pour nos recherches sur l’anthropologie des savoirs.

Les recherches présentées dans cet article ont été partiellement financé par le projet ANR-2014-CE24-0020 «DOREMUS.


Towards certified open data in digital service ecosystems

Authors : Anne Immonen, Eila Ovaska, Tuomas Paaso

The opportunities of open data have been recently recognized among companies in different domains. Digital service providers have increasingly been interested in the possibilities of innovating new ideas and services around open data.

Digital service ecosystems provide several advantages for service developers, enabling the service co-innovation and co-creation among ecosystem members utilizing and sharing common assets and knowledge.

The utilization of open data in digital services requires new innovation practices, service development models, and a collaboration environment. These can be provided by the ecosystem. However, since open data can be almost anything and originate from different kinds of data sources, the quality of data becomes the key issue.

The new challenge for service providers is how to guarantee the quality of open data. In the ecosystems, uncertain data quality poses major challenges. The main contribution of this paper is the concept of the Evolvable Open Data based digital service Ecosystem (EODE), which defines the kinds of knowledge and services that are required for validating open data in digital service ecosystems.

Thus, the EODE provides business potential for open data and digital service providers, as well as other actors around open data. The ecosystem capability model, knowledge management models, and the taxonomy of services to support the open data quality certification are described.

Data quality certification confirms that the open data is trustworthy and its quality is good enough to be accepted for the usage of the ecosystem’s services. The five-phase open data quality certification process, according to which open data is brought to the ecosystem and certified for the usage of the digital service ecosystem members using the knowledge models and support services of the ecosystem, is also described.

The initial experiences of the still ongoing validation steps are summarized, and the concept limitations and future development targets are identified.

URL : Towards certified open data in digital service ecosystems

DOI : 10.1007/s11219-017-9378-2

Le défi de l’interopérabilité entre plates-formes pour la construction de savoirs augmentés en sciences humaines et sociales

Auteurs/Authors : Camille Prime-Claverie, Annaïg Mahé

A l’ère numérique, le secteur de la recherche engendre une prolifération de contenus informatisés et garantir un meilleur accès aux résultats de recherche est un objectif qui pourrait paraître aisément réalisable.

Pourtant, depuis une décennie, le secteur de la communication scientifique traverse des mutations profondes qui se traduisent par des difficultés pour l’ensemble des acteurs à se positionner dans ce nouveau contexte.

L’information se retrouve disséminée au sein de plusieurs plateformes nées sous l’impulsion de différents types d’acteurs qui affichent des positions et intérêts parfois divergents.

Dans cet environnement largement distribué, la réalisation de l’interopérabilité devient un enjeu majeur pour un meilleur accès à l’IST, permettant en outre la circulation des données et leur enrichissement.

Cette contribution propose d’aborder la question de la circulation et du partage de la littérature scientifique en sciences humaines et sociales en France à partir de données moissonnables par le protocole OAI-PMH.

Elle tente mettre en exergue ce qui constitue des opportunités ou des freins pour la réutilisation, l’éditorialisation et la construction de savoirs augmentées dans ce domaine.

L’étude menée se centre sur cinq plateformes françaises mettant à disposition des documents scientifiques dans le domaine des SHS et sur l’étude d’un fournisseur de services proposant des fonctionnalités d’enrichissement.