The legal and policy framework for scientific data sharing, mining and reuse

Author : Mélanie Dulong de Rosnay

Text and Data Mining, the automatic processing of large amounts of scientific articles and datasets, is an essential practice for contemporary researchers. Some publishers are challenging it as a lawful activity and the topic is being discussed during European copyright law reform process.

In order to better understand the underlying debate and contribute to the policy discussion, this article first examines the legal status of data access and reuse and licensing policies. It then presents available options supporting the exercise of Text and Data Mining: publication under open licenses, open access legislations and a recognition of the legitimacy of the activity.

For that purpose, the paper analyses the scientific rational for sharing and its legal and technical challenges and opportunities. In particular, it surveys existing open access and open data legislations and discusses implementation in European and Latin America jurisdictions.

Framing Text and Data mining as an exception to copyright could be problematic as it de facto denies that this activity is part of a positive right to read and should not require additional permission nor licensing.

It is crucial in licenses and legislations to provide a correct definition of what is Open Access, and to address the question of pre-existing copyright agreements. Also, providing implementation means and technical support is key. Otherwise, legislations could remain declarations of good principles if repositories are acting as empty shells.


Scientific data from and for the citizen

Authors : Sven Schade, Chrisa Tsinaraki, Elena Roglia

Powered by advances of technology, today’s Citizen Science projects cover a wide range of thematic areas and are carried out from local to global levels. This wealth of activities creates an abundance of data, for example, in the forms of observations submitted by mobile phones; readings of low-cost sensors; or more general information about peoples’ activities.

The management and possible sharing of this data has become a research topic in its own right. We conducted a survey in the summer of 2015 in order to collectively analyze the state of play in Citizen Science.

This paper summarizes our main findings related to data access, standardization and data preservation. We provide examples of good practices in each of these areas and outline actions to address identified challenges.


Journal Data Sharing Policies and Statistical Reporting Inconsistencies in Psychology

Authors : Michele Nuijten, Jeroen Borghuis, Coosje Veldkamp, Linda Alvarez, Marcel van Assen, Jelte Wicherts

In this paper, we present three studies that investigate the relation between data sharing and statistical reporting inconsistencies. Previous research found that reluctance to share data was related to a higher prevalence of statistical errors, often in the direction of statistical significance (Wicherts, Bakker, & Molenaar, 2011).

We therefore hypothesized that journal policies about data sharing and data sharing itself would reduce these inconsistencies. In Study 1, we compared the prevalence of reporting inconsistencies in two similar journals on decision making with different data sharing policies.

In Study 2, we compared reporting inconsistencies in articles published in PLOS (with a data sharing policy) and Frontiers in Psychology (without a data sharing policy). In Study 3, we looked at papers published in the journal Psychological Science to check whether papers with or without an Open Practice Badge differed in the prevalence of reporting errors.

Overall, we found no relationship between data sharing and reporting inconsistencies. We did find that journal policies on data sharing are extremely effective in promoting data sharing.

We argue that open data is essential in improving the quality of psychological science, and we discuss ways to detect and reduce reporting inconsistencies in the literature.


Understanding Perspectives on Sharing Neutron Data at Oak Ridge National Laboratory

Authors : Devan Ray Donaldson, Shawn Martin, Thomas Proffen

Even though the importance of sharing data is frequently discussed, data sharing appears to be limited to a few fields, and practices within those fields are not well understood. This study examines perspectives on sharing neutron data collected at Oak Ridge National Laboratory’s neutron sources.

Operation at user facilities has traditionally focused on making data accessible to those who create them. The recent emphasis on open data is shifting the focus to ensure that the data produced are reusable by others.

This mixed methods research study included a series of surveys and focus group interviews in which 13 data consumers, data managers, and data producers answered questions about their perspectives on sharing neutron data.

Data consumers reported interest in reusing neutron data for comparison/verification of results against their own measurements and testing new theories using existing data. They also stressed the importance of establishing context for data, including how data are produced, how samples are prepared, units of measurement, and how temperatures are determined.

Data managers expressed reservations about reusing others’ data because they were not always sure if they could trust whether the people responsible for interpreting data did so correctly.

Data producers described concerns about their data being misused, competing with other users, and over-reliance on data producers to understand data. We present the Consumers Managers Producers (CMP) Model for understanding the interplay of each group regarding data sharing.

We conclude with policy and system recommendations and discuss directions for future research.

URL : Understanding Perspectives on Sharing Neutron Data at Oak Ridge National Laboratory


A reputation economy: how individual reward considerations trump systemic arguments for open access to data

Authors : Benedikt Fecher, Sascha Friesike, Marcel Hebing, Stephanie Linek

Open access to research data has been described as a driver of innovation and a potential cure for the reproducibility crisis in many academic fields. Against this backdrop, policy makers are increasingly advocating for making research data and supporting material openly available online.

Despite its potential to further scientific progress, widespread data sharing in small science is still an ideal practised in moderation. In this article, we explore the question of what drives open access to research data using a survey among 1564 mainly German researchers across all disciplines.

We show that, regardless of their disciplinary background, researchers recognize the benefits of open access to research data for both their own research and scientific progress as a whole. Nonetheless, most researchers share their data only selectively.

We show that individual reward considerations conflict with widespread data sharing. Based on our results, we present policy implications that are in line with both individual reward considerations and scientific progress.

URL : A reputation economy: how individual reward considerations trump systemic arguments for open access to data

DOI : 10.1057/palcomms.2017.51

Advancing research data publishing practices for the social sciences: from archive activity to empowering researchers

Authors : Veerle Van den Eynden, Louise Corti

Sharing and publishing social science research data have a long history in the UK, through long-standing agreements with government agencies for sharing survey data and the data policy, infrastructure, and data services supported by the Economic and Social Research Council.

The UK Data Service and its predecessors developed data management, documentation, and publishing procedures and protocols that stand today as robust templates for data publishing.

As the ESRC research data policy requires grant holders to submit their research data to the UK Data Service after a grant ends, setting standards and promoting them has been essential in raising the quality of the resulting research data being published. In the past, received data were all processed, documented, and published for reuse in-house.

Recent investments have focused on guiding and training researchers in good data management practices and skills for creating shareable data, as well as a self-publishing repository system, ReShare. ReShare also receives data sets described in published data papers and achieves scientific quality assurance through peer review of submitted data sets before publication.

Social science data are reused for research, to inform policy, in teaching and for methods learning. Over a 10 years period, responsive developments in system workflows, access control options, persistent identifiers, templates, and checks, together with targeted guidance for researchers, have helped raise the standard of self-publishing social science data.

Lessons learned and developments in shifting publishing social science data from an archivist responsibility to a researcher process are showcased, as inspiration for institutions setting up a data repository.

URL : Advancing research data publishing practices for the social sciences: from archive activity to empowering researchers

DOI : doi:10.1007/s00799-016-0177-3

Open data, [open] access: linking data sharing and article sharing in the Earth Sciences

Author : Samantha Teplitzky


The norms of a research community influence practice, and norms of openness and sharing can be shaped to encourage researchers who share in one aspect of their research cycle to share in another.

Different sets of mandates have evolved to require that research data be made public, but not necessarily articles resulting from that collected data. In this paper, I ask to what extent publications in the Earth Sciences are more likely to be open access (in all of its definitions) when researchers open their data through the Pangaea repository.


Citations from Pangaea data sets were studied to determine the level of open access for each article.


This study finds that the proportion of gold open access articles linked to the repository increased 25% from 2010 to 2015 and 75% of articles were available from multiple open sources.


The context for increased preference for gold open access is considered and future work linking researchers’ decisions to open their work to the adoption of open access mandates is proposed.

URL : Open data, [open] access: linking data sharing and article sharing in the Earth Sciences