Standardising and harmonising research data policy in scholarly publishing

Authors : Iain Hrynaszkiewicz, Aliaksandr Birukou, Mathias Astell, Sowmya Swaminathan, Amye Kenall, Varsha Khodiyar

To address the complexities researchers face during publication, and the potential community-wide benefits of wider adoption of clear data policies, the publisher Springer Nature has developed a standardised, common framework for the research data policies of all its journals. An expert working group was convened to audit and identify common features of research data policies of the journals published by Springer Nature, where policies were present.

The group then consulted with approximately 30 editors, covering all research disciplines, within the organisation. The group also consulted with academic editors and librarians and funders, which informed development of the framework and the creation of supporting resources.

Four types of data policy were defined in recognition that some journals and research communities are more ready than others to adopt strong data policies. As of January 2017 more than 700 journals have adopted a standard policy and this number is growing weekly. To potentially enable standardisation and harmonisation of data policy across funders, institutions, repositories, societies and other publishers the policy framework was made available under a Creative Commons license.

However, the framework requires wider debate with these stakeholders and an Interest Group within the Research Data Alliance (RDA) has been formed to initiate this process.

This paper was presented at the 12th International Digital Curation Conference, Edinburgh, UK on 22 February 2017 and will be submitted to International Journal of Digital Curation.

URL : Standardising and harmonising research data policy in scholarly publishing

DOI : https://doi.org/10.1101/122929

Recommended versus Certified Repositories: Mind the Gap

Authors : Sean Edward Husen, Zoë G. de Wilde, Anita de Waard, Helena Cousijn

Researchers are increasingly required to make research data publicly available in data repositories. Although several organisations propose criteria to recommend and evaluate the quality of data repositories, there is no consensus of what constitutes a good data repository.

In this paper, we investigate, first, which data repositories are recommended by various stakeholders (publishers, funders, and community organizations) and second, which repositories are certified by a number of organisations.

We then compare these two lists of repositories, and the criteria for recommendation and certification. We find that criteria used by organisations recommending and certifying repositories are similar, although the certification criteria are generally more detailed.

We distil the lists of criteria into seven main categories: “Mission”, “Community/Recognition”, “Legal and Contractual Compliance”, “Access/Accessibility”, “Technical Structure/Interface”, “Retrievability” and “Preservation”.

Although the criteria are similar, the lists of repositories that are recommended by the various agencies are very different. Out of all of the recommended repositories, less than 6% obtained certification.

As certification is becoming more important, steps should be taken to decrease this gap between recommended and certified repositories, and ensure that certification standards become applicable, and applied, to the repositories which researchers are currently using.

URL : Recommended versus Certified Repositories: Mind the Gap

DOI: https://doi.org/10.5334/dsj-2017-042

Quels choix juridiques pour la médiation culturelle et scientifique dans l’environnement numérique ?

Auteur/Author : Lionel Maurel

La dimension juridique n’est pas forcément celle à laquelle on songe en premier lorsque l’on envisage les «enjeux numériques pour la médiation scientifique et culturelle du passé».

Pourtant, tout autant que la technique, le droit est devenu aujourd’hui un facteur essentiel d’interopérabilité dans l’environnement numérique. Tout projet culturel ou scientifique produisant des données et/ou des contenus doit s’interroger sur les conditions juridiques de mise à disposition de ces objets, sous peine que ces questions ne se posent ensuite a posteriori, en provoquant alors souvent difficultés et blocages pour ne pas avoir été suffisamment anticipées.

Cette dimension juridique est néanmoins de plus en plus importante pour les institutions culturelles (archives, bibliothèques, musées, etc.), ainsi que pour les équipes de chercheurs à mesure que la démarche du Linked Open Data (LOD) se développe et place les porteurs de projets devant des choix souvent complexes à effectuer.

L’ouverture des données implique en effet d’être en mesure de choisir entre plusieurs licences parmi le panel d’outils contractuels existants pour les appliquer à différents objets, sachant que leurs effets varient sensiblement et ne sont pas neutres pour les réutilisateurs en aval.

La visibilité des projets, leur capacité à nouer des relations avec d’autres initiatives et les formes même de médiation qui pourront être mis en oeuvre auprès de différents publics découlent en partie des décisions qui auront été prises à propos des conditions d’utilisation des données et contenus.

Le présent article vise à décrire les principes de base à partir desquels ces choix peuvent être effectués dans de bonnes conditions. En particulier, cet article s’attachera à montrer que faire le choix de l’ouverture par le biais de licences adaptées constitue un atout pour le développement de la médiation autour des données de la recherche.

URL : https://hal.archives-ouvertes.fr/hal-01577998/

 

What do data curators care about? Data quality, user trust, and the data reuse plan

Author : Frank Andreas Sposito

Data curation is often defined as the practice of maintaining, preserving, and enhancing research data for long-term value and reusability. The role of data reuse in the data curation lifecycle is critical: increased reuse is the core justification for the often sizable expenditures necessary to build data management infrastructures and user services.

Yet recent studies have shown that data are being shared and reused through open data repositories at much lower levels than expected. These studies underscore a fundamental and often overlooked challenge in research data management that invites deeper examination of the roles and responsibilities of data curators.

This presentation will identify key barriers to data reuse, data quality and user trust, and propose a framework for implementing reuser-centric strategies to increase data reuse.

Using the concept of a “data reuse plan” it will highlight repository-based approaches to improve data quality and user trust, and address critical areas for innovation for data curators working in the absence of repository support.

URL : What do data curators care about? Data quality, user trust, and the data reuse plan

Alternative location : http://library.ifla.org/id/eprint/1797

 

How to share data for collaboration

Authors : Shannon E Ellis, Jeffrey T Leek

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data.

In these, we highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis.

URL : How to share data for collaboration

DOI : https://doi.org/10.7287/peerj.preprints.3139v1

 

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

Authors : Julie A. McMurry, Nick Juty, Niklas Blomberg, Tony Burdett, Tom Conlin, Nathalie Conte, Mélanie Courtot, John Deck, Michel Dumontier, Donal K. Fellows, Alejandra Gonzalez-Beltran, Philipp Gormanns, Jeffrey Grethe, Janna Hastings, Jean-Karim Hériché, Henning Hermjakob, Jon C. Ison, Rafael C. Jimenez, Simon Jupp, John Kunze, Camille Laibe, Nicolas Le Novère, James Malone, Maria Jesus Martin, Johanna R. McEntyre, Chris Morris, Juha Muilu, Wolfgang Müller, Philippe Rocca-Serra, Susanna-Assunta Sansone, Murat Sariyar, Jacky L. Snoep, Stian Soiland-Reyes, Natalie J. Stanford, Neil Swainston, Nicole Washington, Alan R. Williams, Sarala M. Wimalaratne, Lilly M. Winfree, Katherine Wolstencroft, Carole Goble, Christopher J. Mungall, Melissa A. Haendel, Helen Parkinson

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure.

Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers.

We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability.

We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

URL : Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

DOI : https://doi.org/10.1371/journal.pbio.2001414

The legal and policy framework for scientific data sharing, mining and reuse

Author : Mélanie Dulong de Rosnay

Text and Data Mining, the automatic processing of large amounts of scientific articles and datasets, is an essential practice for contemporary researchers. Some publishers are challenging it as a lawful activity and the topic is being discussed during European copyright law reform process.

In order to better understand the underlying debate and contribute to the policy discussion, this article first examines the legal status of data access and reuse and licensing policies. It then presents available options supporting the exercise of Text and Data Mining: publication under open licenses, open access legislations and a recognition of the legitimacy of the activity.

For that purpose, the paper analyses the scientific rational for sharing and its legal and technical challenges and opportunities. In particular, it surveys existing open access and open data legislations and discusses implementation in European and Latin America jurisdictions.

Framing Text and Data mining as an exception to copyright could be problematic as it de facto denies that this activity is part of a positive right to read and should not require additional permission nor licensing.

It is crucial in licenses and legislations to provide a correct definition of what is Open Access, and to address the question of pre-existing copyright agreements. Also, providing implementation means and technical support is key. Otherwise, legislations could remain declarations of good principles if repositories are acting as empty shells.

URL ; https://books.openedition.org/editionsmsh/9082