Facilitating and Improving Environmental Research Data Repository Interoperability

Authors : Corinna Gries, Amber Budden, Christine Laney, Margaret O’Brien, Mark Servilla, Wade Sheldon, Kristin Vanderbilt, David Vieglais

Environmental research data repositories provide much needed services for data preservation and data dissemination to diverse communities with domain specific or programmatic data needs and standards.

Due to independent development these repositories serve their communities well, but were developed with different technologies, data models and using different ontologies. Hence, the effectiveness and efficiency of these services can be vastly improved if repositories work together adhering to a shared community platform that focuses on the implementation of agreed upon standards and best practices for curation and dissemination of data.

Such a community platform drives forward the convergence of technologies and practices that will advance cross-domain interoperability. It will also facilitate contributions from investigators through standardized and streamlined workflows and provide increased visibility for the role of data managers and the curation services provided by data repositories, beyond preservation infrastructure.

Ten specific suggestions for such standardizations are outlined without any suggestions for priority or technical implementation. Although the recommendations are for repositories to implement, they have been chosen specifically with the data provider/data curator and synthesis scientist in mind.

URL : Facilitating and Improving Environmental Research Data Repository Interoperability

DOI : http://doi.org/10.5334/dsj-2018-022

Interoperability and FAIRness through a novel combination of Web technologies

Authors : Mark D. Wilkinson, Ruben Verborgh, Luiz Olavo Bonino da Silva Santos, Tim Clark, Morris A. Swertz, Fleur D.L. Kelpin, Alasdair J.G. Gray, Erik A. Schultes, Erik M. van Mulligen, Paolo Ciccarese, Arnold Kuzniar, Anand Gavai, Mark Thompson, Rajaram Kaliyaperumal, Jerven T. Bolleman, Michel Dumontier

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT).

These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not.

The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability.

Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings.

We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles.

The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

URL : Interoperability and FAIRness through a novel combination of Web technologies

DOI : https://doi.org/10.7717/peerj-cs.110

Building a Disciplinary, World‐Wide Data Infrastructure

Authors: Françoise Genova, Christophe Arviset, Bridget M. Almas, Laura Bartolo, Daan Broeder, Emily Law, Brian McMahon

Sharing scientific data with the objective of making it discoverable, accessible, reusable, and interoperable requires work and presents challenges being faced at the disciplinary level to define in particular how the data should be formatted and described.

This paper represents the Proceedings of a session held at SciDataCon 2016 (Denver, 12–13 September 2016). It explores the way a range of disciplines, namely materials science, crystallography, astronomy, earth sciences, humanities and linguistics, get organized at the international level to address those challenges. T

he disciplinary culture with respect to data sharing, science drivers, organization, lessons learnt and the elements of the data infrastructure which are or could be shared with others are briefly described. Commonalities and differences are assessed.

Common key elements for success are identified: data sharing should be science driven; defining the disciplinary part of the interdisciplinary standards is mandatory but challenging; sharing of applications should accompany data sharing. Incentives such as journal and funding agency requirements are also similar.

For all, social aspects are more challenging than technological ones. Governance is more diverse, often specific to the discipline organization. Being problem‐driven is also a key factor of success for building bridges to enable interdisciplinary research.

Several international data organizations such as CODATA, RDA and WDS can facilitate the establishment of disciplinary interoperability frameworks. As a spin‐off of the session, a RDA Disciplinary Interoperability Interest Group is proposed to bring together representatives across disciplines to better organize and drive the discussion for prioritizing, harmonizing and efficiently articulating disciplinary needs.

URL : Building a Disciplinary, World‐Wide Data Infrastructure

DOI : http://doi.org/10.5334/dsj-2017-016

 

Scientific data from and for the citizen

Authors : Sven Schade, Chrisa Tsinaraki, Elena Roglia

Powered by advances of technology, today’s Citizen Science projects cover a wide range of thematic areas and are carried out from local to global levels. This wealth of activities creates an abundance of data, for example, in the forms of observations submitted by mobile phones; readings of low-cost sensors; or more general information about peoples’ activities.

The management and possible sharing of this data has become a research topic in its own right. We conducted a survey in the summer of 2015 in order to collectively analyze the state of play in Citizen Science.

This paper summarizes our main findings related to data access, standardization and data preservation. We provide examples of good practices in each of these areas and outline actions to address identified challenges.

URL : http://firstmonday.org/ojs/index.php/fm/article/view/7842

Interopérabilité et logiques organisationnelles. Ce qu’ouvrir ses données veut dire

Auteurs/Authors : Marie Després-Lonnet, Béatrice Micheau, Marie Destandau

Dans la perspective de l’ouverture des données publiques, nous accompagnons trois institutions qui gèrent des fonds liés à la musique, dans cette triple évolution technique, organisationnelle et politique.

L’objectif est de concevoir une « ontologie » qui servira d’appui à la description de la musique. Notre collaboration avec les experts a permis de saisir les tensions que ce projet génère, malgré la volonté collective de parvenir à une modélisation partagée.

Nous avons ainsi pu montrer que chaque institution porte un regard situé sur la musique comme pratique sociale et sur les objets et documents qu’elle détient. La recherche d’un modèle commun et qui pourrait s’appliquer globalement nécessite que chaque institution envisage les données et les concepts associés d’une façon plus globale et remette en partie en question ses modes de faire.

Notre étude montre que pour ne pas aboutir à un modèle totalement abstrait, il convient de voir la modélisation comme une forme de discours qui s’inscrit dans la continuité des écritures de notre patrimoine culturel : écritures vivantes, faites de négociations constantes entre normes et bricolages, nécessités organisationnelles et adaptation à des contraintes ponctuelles, dont nous retrouvons sans cesse les multiples traces, qui sont autant de matériaux pour nos recherches sur l’anthropologie des savoirs.

Les recherches présentées dans cet article ont été partiellement financé par le projet ANR-2014-CE24-0020 «DOREMUS.

URL : http://www.revue-cossi.info/numeros/n-2-2017-bricolages-improvisations-et-resilience-organisationnelle-face-aux-risques-informationnels-et-communicationnels/663-2-2017-revue-despres-lonnet-micheau-destandau

Towards certified open data in digital service ecosystems

Authors : Anne Immonen, Eila Ovaska, Tuomas Paaso

The opportunities of open data have been recently recognized among companies in different domains. Digital service providers have increasingly been interested in the possibilities of innovating new ideas and services around open data.

Digital service ecosystems provide several advantages for service developers, enabling the service co-innovation and co-creation among ecosystem members utilizing and sharing common assets and knowledge.

The utilization of open data in digital services requires new innovation practices, service development models, and a collaboration environment. These can be provided by the ecosystem. However, since open data can be almost anything and originate from different kinds of data sources, the quality of data becomes the key issue.

The new challenge for service providers is how to guarantee the quality of open data. In the ecosystems, uncertain data quality poses major challenges. The main contribution of this paper is the concept of the Evolvable Open Data based digital service Ecosystem (EODE), which defines the kinds of knowledge and services that are required for validating open data in digital service ecosystems.

Thus, the EODE provides business potential for open data and digital service providers, as well as other actors around open data. The ecosystem capability model, knowledge management models, and the taxonomy of services to support the open data quality certification are described.

Data quality certification confirms that the open data is trustworthy and its quality is good enough to be accepted for the usage of the ecosystem’s services. The five-phase open data quality certification process, according to which open data is brought to the ecosystem and certified for the usage of the digital service ecosystem members using the knowledge models and support services of the ecosystem, is also described.

The initial experiences of the still ongoing validation steps are summarized, and the concept limitations and future development targets are identified.

URL : Towards certified open data in digital service ecosystems

DOI : 10.1007/s11219-017-9378-2