De l’open data à l’open science : retour réflexif sur les méthodes et pratiques d’une recherche sur les données géographiques

Auteurs/Authors : Nathalie Pinède, Matthieu Noucher, Françoise Gourmelon, Karel Soumagnac-Colin

Nous mobilisons ici l’expérience d’un projet de recherche en cours pour analyser la façon dont les nouveaux terrains d’expérimentations sur le web, modifient les conditions de la pratique scientifique, des objets aux méthodes, de l’open data à l’open science.

La massification des données géographiques disponibles sur le web reconfigure les dynamiques de recherche selon trois axes de transformation : les objets, les méthodes et les pratiques de recherche. Tout d’abord, nous soulignerons comment les enjeux de pouvoir autour de la cartographie se sont déplacés avec l’avènement du web et de l’open data.

Nous développerons ensuite les impacts en matière de méthodologie de recherche dans un contexte d’approche interdisciplinaire. Enfin, nous montrerons comment ce projet de recherche s’inscrit dans une démarche de type open science.


Recommended versus Certified Repositories: Mind the Gap

Authors : Sean Edward Husen, Zoë G. de Wilde, Anita de Waard, Helena Cousijn

Researchers are increasingly required to make research data publicly available in data repositories. Although several organisations propose criteria to recommend and evaluate the quality of data repositories, there is no consensus of what constitutes a good data repository.

In this paper, we investigate, first, which data repositories are recommended by various stakeholders (publishers, funders, and community organizations) and second, which repositories are certified by a number of organisations.

We then compare these two lists of repositories, and the criteria for recommendation and certification. We find that criteria used by organisations recommending and certifying repositories are similar, although the certification criteria are generally more detailed.

We distil the lists of criteria into seven main categories: “Mission”, “Community/Recognition”, “Legal and Contractual Compliance”, “Access/Accessibility”, “Technical Structure/Interface”, “Retrievability” and “Preservation”.

Although the criteria are similar, the lists of repositories that are recommended by the various agencies are very different. Out of all of the recommended repositories, less than 6% obtained certification.

As certification is becoming more important, steps should be taken to decrease this gap between recommended and certified repositories, and ensure that certification standards become applicable, and applied, to the repositories which researchers are currently using.

URL : Recommended versus Certified Repositories: Mind the Gap


How to share data for collaboration

Authors : Shannon E Ellis, Jeffrey T Leek

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data.

In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.

URL : How to share data for collaboration



Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

Authors : Julie A. McMurry, Nick Juty, Niklas Blomberg, Tony Burdett, Tom Conlin, Nathalie Conte, Mélanie Courtot, John Deck, Michel Dumontier, Donal K. Fellows, Alejandra Gonzalez-Beltran, Philipp Gormanns, Jeffrey Grethe, Janna Hastings, Jean-Karim Hériché, Henning Hermjakob, Jon C. Ison, Rafael C. Jimenez, Simon Jupp, John Kunze, Camille Laibe, Nicolas Le Novère, James Malone, Maria Jesus Martin, Johanna R. McEntyre, Chris Morris, Juha Muilu, Wolfgang Müller, Philippe Rocca-Serra, Susanna-Assunta Sansone, Murat Sariyar, Jacky L. Snoep, Stian Soiland-Reyes, Natalie J. Stanford, Neil Swainston, Nicole Washington, Alan R. Williams, Sarala M. Wimalaratne, Lilly M. Winfree, Katherine Wolstencroft, Carole Goble, Christopher J. Mungall, Melissa A. Haendel, Helen Parkinson

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure.

Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers.

We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability.

We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

URL : Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data


The legal and policy framework for scientific data sharing, mining and reuse

Author : Mélanie Dulong de Rosnay

Text and Data Mining, the automatic processing of large amounts of scientific articles and datasets, is an essential practice for contemporary researchers. Some publishers are challenging it as a lawful activity and the topic is being discussed during European copyright law reform process.

In order to better understand the underlying debate and contribute to the policy discussion, this article first examines the legal status of data access and reuse and licensing policies. It then presents available options supporting the exercise of Text and Data Mining: publication under open licenses, open access legislations and a recognition of the legitimacy of the activity.

For that purpose, the paper analyses the scientific rational for sharing and its legal and technical challenges and opportunities. In particular, it surveys existing open access and open data legislations and discusses implementation in European and Latin America jurisdictions.

Framing Text and Data mining as an exception to copyright could be problematic as it de facto denies that this activity is part of a positive right to read and should not require additional permission nor licensing.

It is crucial in licenses and legislations to provide a correct definition of what is Open Access, and to address the question of pre-existing copyright agreements. Also, providing implementation means and technical support is key. Otherwise, legislations could remain declarations of good principles if repositories are acting as empty shells.


Scientific data from and for the citizen

Authors : Sven Schade, Chrisa Tsinaraki, Elena Roglia

Powered by advances of technology, today’s Citizen Science projects cover a wide range of thematic areas and are carried out from local to global levels. This wealth of activities creates an abundance of data, for example, in the forms of observations submitted by mobile phones; readings of low-cost sensors; or more general information about peoples’ activities.

The management and possible sharing of this data has become a research topic in its own right. We conducted a survey in the summer of 2015 in order to collectively analyze the state of play in Citizen Science.

This paper summarizes our main findings related to data access, standardization and data preservation. We provide examples of good practices in each of these areas and outline actions to address identified challenges.


Towards a paradigm for open and free sharing of scientific data on global change science in China

Authors : Changhui Peng, Xinzhang Song, Hong Jiang, Qiuan Zhu, Huai Chen, Jing M. Chen, Peng Gong, Chang Jie, Wenhua Xiang, Guirui Yu, Xiaolu Zhou

Despite great progress in data sharing that has been made in China in recent decades, cultural, policy, and technological challenges have prevented Chinese researchers from maximizing the availability of their data to the global change science community.

To achieve full and open exchange and sharing of scientific data, Chinese research funding agencies need to recognize that preservation of, and access to, digital data are central to their mission, and must support these tasks accordingly.

The Chinese government also needs to develop better mechanisms, incentives, and rewards, while scientists need to change their behavior and culture to recognize the need to maximize the usefulness of their data to society as well as to other researchers.

The Chinese research community and individual researchers should think globally and act personally to promote a paradigm of open, free, and timely data sharing, and to increase the effectiveness of knowledge development.

URL : Towards a paradigm for open and free sharing of scientific data on global change science in China