Data Policy Recommendations for Biodiversity Data. EU BON Project Report

There is a strong need for a comprehensive, coherent, and consistent data policy in Europe to increase interoperability of data and to make its reuse both easy and legal. Available single recommendations/guidelines on different topics need to be processed, structured, and unified. Within the context of the EU BON project, a team from the EU BON partners from Museum für Naturkunde Berlin, Plazi, and Pensoft has prepared this report to be used as a part of the Data Publishing Guidelines and Recommendations in the EU BON Biodiversity Portal.

The document deals with the issues: (i) Mobilizing biodiversity data, (ii) Removing legal obstacles, (iii) Changing attitudes, (iv) Data policy recommendations and is addressed to legislators, researchers, research institutions, data aggregators, funders, and publishers.

URL : Data Policy Recommendations for Biodiversity Data. EU BON Project Report

DOI : http://dx.doi.org/10.3897/rio.2.e8458

Identifying and Improving Dataset References in Social Sciences Full Texts

Scientific full text papers are usually stored in separate places than their underlying research datasets. Authors typically make references to datasets by mentioning them for example by using their titles and the year of publication. However, in most cases explicit links that would provide readers with direct access to referenced datasets are missing.

Manually detecting references to datasets in papers is time consuming and requires an expert in the domain of the paper. In order to make explicit all links to datasets in papers that have been published already, we suggest and evaluate a semi-automatic approach for finding references to datasets in social sciences papers.

Our approach does not need a corpus of papers (no cold start problem) and it performs well on a small test corpus (gold standard). Our approach achieved an F-measure of 0.84 for identifying references in full texts and an F-measure of 0.83 for finding correct matches of detected references in the da|ra dataset registry.

URL : http://arxiv.org/abs/1603.01774v1

Data trajectories: tracking reuse of published data for transitive credit attribution

The ability to measure the use and impact of published data sets is key to the success of the open data / open science paradigm. A direct measure of impact would require tracking data (re)use in the wild, which however is difficult to achieve. This is therefore commonly replaced by simpler metrics based on data download and citation counts.

In this paper we describe a scenario where it is possible to track the trajectory of a dataset after its publication, and we show how this enables the design of accurate models for ascribing credit to data originators. A Data Trajectory (DT) is a graph that encodes knowledge of how, by whom, and in which context data has been re-used, possibly after several generations.

We provide a theoretical model of DTs that is grounded in the W3C PROV data model for provenance, and we show how DTs can be used to automatically propagate a fraction of the credit associated with transitively derived datasets, back to original data contributors. We also show this model of transitive credit in action by means of a Data Reuse Simulator.

Ultimately, our hope is that, in the longer term, credit models based on direct measures of data reuse will provide further incentives to data publication. We conclude by outlining a research agenda to address the hard questions of creating, collecting, and using DTs systematically across a large number of data reuse instances, in the wild.

URL : Data trajectories: tracking reuse of published data for transitive credit attribution

Alternative location : http://homepages.cs.ncl.ac.uk/paolo.missier/doc/DT.pdf

Access to and Preservation of Scientific Information in Europe

Executive summary

An important aspect of open science is a move towards open access to publicly funded research results, including scientific publications as well as research data. Based on the structure of Commission Recommendation C(2012) 4890 final and its assorted reporting mechanism (the National Points of Reference for scientific information) this report provides an overview on access to and preservation of scientific information in the EU Member States as well as Norway and Turkey. It is based on self-reporting by the participating states as well as cross-referencing with other relevant documents and further desk research.

Concerning open access to scientific peer-reviewed publications, most EU Member States reported a national preference for one of the two types of open access, either the Green (self-archiving) or the Gold (open access publishing) model. Preference for the Green model is found in Belgium, Cyprus, Denmark, Estonia, Greece, Ireland, Lithuania, Malta, Norway, Portugal, Slovakia and Spain. Those expressing a preference for the Gold model are Hungary, the Netherlands, Romania, Sweden and the United Kingdom.

Other Member States support both models equally, such as Germany, France, Croatia, Italy, Luxembourg, Poland and Finland. However, the expressed preferences for one of the two models are not pure models in which only one route is followed. Instead, there is generally a system of predominance of one model with the possibility of using the other model, so a mixture of both routes results.

While few Member States have a national law requiring open access to publications, a mandate put in place by law is not necessarily stronger or more effective than a mandate put in place by a single institution or funder. For example, an open access mandate is strong as it ties open access to possible withdrawal of funds in the case of non-compliance, or to the evaluation of researchers’ careers.

Overall, policies on open access to research data are less developed across EU countries than policies and strategies on open access to research publications. However, individual Member State feedback shows a general acknowledgement of the importance of open research data and of policies, strategies and actions addressed at fostering the collection, curation, preservation and re-use of research data. Based on the self-reporting of the EU Member States and participating associated countries, the following classification is proposed.

  • Very little or no open access to research data policies in place and no plan for a more developed policy in the near future: Cyprus, Latvia, Luxembourg, Malta, Poland.
  • Very little or no open access to research data policies in place, but some plans in place or under development: Austria, Belgium, Croatia, Czech Republic, Estonia, Hungary, Italy, Portugal, Romania, Slovakia, Sweden, Turkey.
  • Open access policies/institutional strategies or subject-based initiatives for research data already in place: Denmark, Finland, France, Germany, Ireland, Lithuania, the Netherlands, Norway, Slovenia, the United Kingdom.

Concerning the curation and preservation of scientific information (another issue covered by the 2012 Recommendation), institutional repositories are very well developed in most Member States although some NPR reports stress that, in many cases, institutional repositories are not certified to properly guarantee the long-term preservation of scientific information.

NPR reports also show that many Member States have made a clear effort to become more efficient and transparent regarding scientific information and research activities in general. This being said, some Member States underline research information purposes rather than the objective of open access to research results, with most CRIS systems containing meta-data and not necessarily full results.

Nevertheless, a tendency can be observed among the latest wave of EU enlargement countries that they are focusing efforts on developing centralised national repositories for preservation to be connected to the existing national CRIS systems and to be inter-operable across the EU with, for example, OpenAIRE protocols.

Many Member States have devised global policies and strategies for developing e-infrastructures in a comprehensive way. Such strategies often contain specific chapters or sections addressing scientific information, research and innovation, covering storage and high-performance computing capabilities as well as the appropriate dissemination, access and visibility of research results. As is the case in other areas, the stage of e-infrastructure development varies greatly among Member States, and it is worth noting differences in funding capabilities in this area. The support provided by EU-funded projects and initiatives is of significant importance here.

Concerning participation in multi-stakeholder dialogues and activities, several countries have set up national coordination bodies or networks (Belgium, Denmark, Germany, Italy, Austria, Poland, Portugal). Other countries rely on a university or a university library (or an association of libraries) to coordinate national stakeholders (Czech Republic, Lithuania, Luxembourg, Malta) or on their research promotion agency/research councils (Cyprus, Sweden, the United Kingdom) or their academy of science (Slovakia).

Specific events, such as open-access workshops or activities during the annual open-access week, have also been identified as a way to galvanise stakeholder interaction at the national level (Czech Republic, Croatia, Italy, Romania). Additionally to EU fora (such as ERA, ERAC, the NPR, the Digital ERA Forum and the E-IRG), EU funded projects such as OpenAIRE FOSTER and PASTEUR4OA as well as PEER, Dariah and Serscida were mentioned as important support mechanisms. Furthermore, Belgium and the Netherlands have established bilateral cooperation and among the Nordic states (Denmark, Finland, Iceland, Norway and Sweden) part of the dialogue on open science is conducted within the framework of the NordForsk organisation.

URL : Access to and Preservation of Scientific Information in Europe

Alternative location : http://ec.europa.eu/research/openscience/pdf/openaccess/npr_report.pdf

Open Source Archaeology: Ethics and Practice

‘Open Source Archaeology: Ethics and Practice’ brings together authors and researchers in the field of open-source archaeology, defined as encompassing the ethical imperative for open public access to the results of publicly-funded research; practical solutions to open-data projects; open-source software applications in archaeology; public information sharing projects in archaeology; open-GIS; and the open-context system of data management and sharing.

This edited volume is designed to discuss important issues around open access to data and software in academic and commercial archaeology, as well as to summarise both the current state of theoretical engagement, and technological development in the field of open-archaeology.

URL : http://www.degruyter.com/view/product/460080

Research data explored: an extended analysis of citations and altmetrics

In this study, we explore the citedness of research data, its distribution over time and its relation to the availability of a digital object identifier (DOI) in the Thomson Reuters database Data Citation Index (DCI).

We investigate if cited research data “impacts” the (social) web, reflected by altmetrics scores, and if there is any relationship between the number of citations and the sum of altmetrics scores from various social media platforms.

Three tools are used to collect altmetrics scores, namely PlumX, ImpactStory, and Altmetric.com, and the corresponding results are compared. We found that out of the three altmetrics tools, PlumX has the best coverage. Our experiments revealed that research data remain mostly uncited (about 85 %), although there has been an increase in citing data sets published since 2008.

The percentage of the number of cited research data with a DOI in DCI has decreased in the last years. Only nine repositories are responsible for research data with DOIs and two or more citations. The number of cited research data with altmetrics “foot-prints” is even lower (4–9 %) but shows a higher coverage of research data from the last decade. In our study, we also found no correlation between the number of citations and the total number of altmetrics scores.

Yet, certain data types (i.e. survey, aggregate data, and sequence data) are more often cited and also receive higher altmetrics scores. Additionally, we performed citation and altmetric analyses of all research data published between 2011 and 2013 in four different disciplines covered by the DCI.

In general, these results correspond very well with the ones obtained for research data cited at least twice and also show low numbers in citations and in altmetrics. Finally, we observed that there are disciplinary differences in the availability and extent of altmetrics scores.

URL : http://link.springer.com/article/10.1007/s11192-016-1887-4

Open Data in Global Environmental Research: The Belmont Forum’s Open Data Survey

This paper presents the findings of the Belmont Forum’s survey on Open Data which targeted the global environmental research and data infrastructure community. It highlights users’ perceptions of the term “open data”, expectations of infrastructure functionalities, and barriers and enablers for the sharing of data. A wide range of good practice examples was pointed out by the respondents which demonstrates a substantial uptake of data sharing through e-infrastructures and a further need for enhancement and consolidation. Among all policy responses, funder policies seem to be the most important motivator. This supports the conclusion that stronger mandates will strengthen the case for data sharing.

URL : Open Data in Global Environmental Research: The Belmont Forum’s Open Data Survey

DOI :10.1371/journal.pone.0146695