Data trajectories: tracking reuse of published data for transitive credit attribution

The ability to measure the use and impact of published data sets is key to the success of the open data / open science paradigm. A direct measure of impact would require tracking data (re)use in the wild, which however is difficult to achieve. This is therefore commonly replaced by simpler metrics based on data download and citation counts.

In this paper we describe a scenario where it is possible to track the trajectory of a dataset after its publication, and we show how this enables the design of accurate models for ascribing credit to data originators. A Data Trajectory (DT) is a graph that encodes knowledge of how, by whom, and in which context data has been re-used, possibly after several generations.

We provide a theoretical model of DTs that is grounded in the W3C PROV data model for provenance, and we show how DTs can be used to automatically propagate a fraction of the credit associated with transitively derived datasets, back to original data contributors. We also show this model of transitive credit in action by means of a Data Reuse Simulator.

Ultimately, our hope is that, in the longer term, credit models based on direct measures of data reuse will provide further incentives to data publication. We conclude by outlining a research agenda to address the hard questions of creating, collecting, and using DTs systematically across a large number of data reuse instances, in the wild.

URL : Data trajectories: tracking reuse of published data for transitive credit attribution

Alternative location : http://homepages.cs.ncl.ac.uk/paolo.missier/doc/DT.pdf

Access to and Preservation of Scientific Information in Europe

Executive summary

An important aspect of open science is a move towards open access to publicly funded research results, including scientific publications as well as research data. Based on the structure of Commission Recommendation C(2012) 4890 final and its assorted reporting mechanism (the National Points of Reference for scientific information) this report provides an overview on access to and preservation of scientific information in the EU Member States as well as Norway and Turkey. It is based on self-reporting by the participating states as well as cross-referencing with other relevant documents and further desk research.

Concerning open access to scientific peer-reviewed publications, most EU Member States reported a national preference for one of the two types of open access, either the Green (self-archiving) or the Gold (open access publishing) model. Preference for the Green model is found in Belgium, Cyprus, Denmark, Estonia, Greece, Ireland, Lithuania, Malta, Norway, Portugal, Slovakia and Spain. Those expressing a preference for the Gold model are Hungary, the Netherlands, Romania, Sweden and the United Kingdom.

Other Member States support both models equally, such as Germany, France, Croatia, Italy, Luxembourg, Poland and Finland. However, the expressed preferences for one of the two models are not pure models in which only one route is followed. Instead, there is generally a system of predominance of one model with the possibility of using the other model, so a mixture of both routes results.

While few Member States have a national law requiring open access to publications, a mandate put in place by law is not necessarily stronger or more effective than a mandate put in place by a single institution or funder. For example, an open access mandate is strong as it ties open access to possible withdrawal of funds in the case of non-compliance, or to the evaluation of researchers’ careers.

Overall, policies on open access to research data are less developed across EU countries than policies and strategies on open access to research publications. However, individual Member State feedback shows a general acknowledgement of the importance of open research data and of policies, strategies and actions addressed at fostering the collection, curation, preservation and re-use of research data. Based on the self-reporting of the EU Member States and participating associated countries, the following classification is proposed.

  • Very little or no open access to research data policies in place and no plan for a more developed policy in the near future: Cyprus, Latvia, Luxembourg, Malta, Poland.
  • Very little or no open access to research data policies in place, but some plans in place or under development: Austria, Belgium, Croatia, Czech Republic, Estonia, Hungary, Italy, Portugal, Romania, Slovakia, Sweden, Turkey.
  • Open access policies/institutional strategies or subject-based initiatives for research data already in place: Denmark, Finland, France, Germany, Ireland, Lithuania, the Netherlands, Norway, Slovenia, the United Kingdom.

Concerning the curation and preservation of scientific information (another issue covered by the 2012 Recommendation), institutional repositories are very well developed in most Member States although some NPR reports stress that, in many cases, institutional repositories are not certified to properly guarantee the long-term preservation of scientific information.

NPR reports also show that many Member States have made a clear effort to become more efficient and transparent regarding scientific information and research activities in general. This being said, some Member States underline research information purposes rather than the objective of open access to research results, with most CRIS systems containing meta-data and not necessarily full results.

Nevertheless, a tendency can be observed among the latest wave of EU enlargement countries that they are focusing efforts on developing centralised national repositories for preservation to be connected to the existing national CRIS systems and to be inter-operable across the EU with, for example, OpenAIRE protocols.

Many Member States have devised global policies and strategies for developing e-infrastructures in a comprehensive way. Such strategies often contain specific chapters or sections addressing scientific information, research and innovation, covering storage and high-performance computing capabilities as well as the appropriate dissemination, access and visibility of research results. As is the case in other areas, the stage of e-infrastructure development varies greatly among Member States, and it is worth noting differences in funding capabilities in this area. The support provided by EU-funded projects and initiatives is of significant importance here.

Concerning participation in multi-stakeholder dialogues and activities, several countries have set up national coordination bodies or networks (Belgium, Denmark, Germany, Italy, Austria, Poland, Portugal). Other countries rely on a university or a university library (or an association of libraries) to coordinate national stakeholders (Czech Republic, Lithuania, Luxembourg, Malta) or on their research promotion agency/research councils (Cyprus, Sweden, the United Kingdom) or their academy of science (Slovakia).

Specific events, such as open-access workshops or activities during the annual open-access week, have also been identified as a way to galvanise stakeholder interaction at the national level (Czech Republic, Croatia, Italy, Romania). Additionally to EU fora (such as ERA, ERAC, the NPR, the Digital ERA Forum and the E-IRG), EU funded projects such as OpenAIRE FOSTER and PASTEUR4OA as well as PEER, Dariah and Serscida were mentioned as important support mechanisms. Furthermore, Belgium and the Netherlands have established bilateral cooperation and among the Nordic states (Denmark, Finland, Iceland, Norway and Sweden) part of the dialogue on open science is conducted within the framework of the NordForsk organisation.

URL : Access to and Preservation of Scientific Information in Europe

Alternative location : http://ec.europa.eu/research/openscience/pdf/openaccess/npr_report.pdf

Open Source Archaeology: Ethics and Practice

‘Open Source Archaeology: Ethics and Practice’ brings together authors and researchers in the field of open-source archaeology, defined as encompassing the ethical imperative for open public access to the results of publicly-funded research; practical solutions to open-data projects; open-source software applications in archaeology; public information sharing projects in archaeology; open-GIS; and the open-context system of data management and sharing.

This edited volume is designed to discuss important issues around open access to data and software in academic and commercial archaeology, as well as to summarise both the current state of theoretical engagement, and technological development in the field of open-archaeology.

URL : http://www.degruyter.com/view/product/460080

Research data explored: an extended analysis of citations and altmetrics

In this study, we explore the citedness of research data, its distribution over time and its relation to the availability of a digital object identifier (DOI) in the Thomson Reuters database Data Citation Index (DCI).

We investigate if cited research data “impacts” the (social) web, reflected by altmetrics scores, and if there is any relationship between the number of citations and the sum of altmetrics scores from various social media platforms.

Three tools are used to collect altmetrics scores, namely PlumX, ImpactStory, and Altmetric.com, and the corresponding results are compared. We found that out of the three altmetrics tools, PlumX has the best coverage. Our experiments revealed that research data remain mostly uncited (about 85 %), although there has been an increase in citing data sets published since 2008.

The percentage of the number of cited research data with a DOI in DCI has decreased in the last years. Only nine repositories are responsible for research data with DOIs and two or more citations. The number of cited research data with altmetrics “foot-prints” is even lower (4–9 %) but shows a higher coverage of research data from the last decade. In our study, we also found no correlation between the number of citations and the total number of altmetrics scores.

Yet, certain data types (i.e. survey, aggregate data, and sequence data) are more often cited and also receive higher altmetrics scores. Additionally, we performed citation and altmetric analyses of all research data published between 2011 and 2013 in four different disciplines covered by the DCI.

In general, these results correspond very well with the ones obtained for research data cited at least twice and also show low numbers in citations and in altmetrics. Finally, we observed that there are disciplinary differences in the availability and extent of altmetrics scores.

URL : http://link.springer.com/article/10.1007/s11192-016-1887-4

Open Data in Global Environmental Research: The Belmont Forum’s Open Data Survey

This paper presents the findings of the Belmont Forum’s survey on Open Data which targeted the global environmental research and data infrastructure community. It highlights users’ perceptions of the term “open data”, expectations of infrastructure functionalities, and barriers and enablers for the sharing of data. A wide range of good practice examples was pointed out by the respondents which demonstrates a substantial uptake of data sharing through e-infrastructures and a further need for enhancement and consolidation. Among all policy responses, funder policies seem to be the most important motivator. This supports the conclusion that stronger mandates will strengthen the case for data sharing.

URL : Open Data in Global Environmental Research: The Belmont Forum’s Open Data Survey

DOI :10.1371/journal.pone.0146695

Enabling Open Science: Wikidata for Research

Wiki4R will create an innovative virtual research environment (VRE) for Open Science at scale, engaging both professional researchers and citizen data scientists in new and potentially transformative forms of collaboration. It is based on the realizations that (1) the structured parts of the Web itself can be regarded as a VRE, (2) such environments depend on communities, (3) closed environments are limited in their capacity to nurture thriving communities.

Wiki4R will therefore integrate Wikidata, the multilingual semantic backbone behind Wikipedia, into existing research processes to enable transdisciplinary research and reduce fragmentation of research in and outside Europe. By establishing a central shared information node, research data can be linked and annotated into knowledge. Despite occasional uses of Wikipedia or Wikidata in research, significant barriers to broader adoption in the sciences or digital humanities exist, including lack of integration into existing research processes and inadequate handling of provenances.

The proposed actions include providing best practices and tools for semantic mapping, adoption of citation and author identifiers, interoperability layers for integration with existing research environments, and the development of policies for information quality and interchange. The effectiveness of the actions will be tested in pilot use cases.

Unforeseen barriers will be investigated and documented. We will promote the adoption of Wiki4R by making it easy to use and integrate, demonstrate the applicability in selected research domains, and provide diverse training opportunities.

Wiki4R leverages the expertise gained in Europe through the Wikidata and DBpedia projects to further strengthen the established virtual community of 14000 people. As a result of increased interaction between professional science and citizens, it will provide an improved basis for Responsible Research and Innovation and Open Science in the European Research Area.

URL : Enabling Open Science: Wikidata for Research

Alternative location : http://rio.pensoft.net/articles.php?id=7573

Assessing Research Data Management Practices of Faculty at Carnegie Mellon University

INTRODUCTION

Recent changes to requirements for research data management by federal granting agencies and by other funding institutions have resulted in the emergence of institutional support for these requirements. At CMU, we sought to formalize assessment of research data management practices of researchers at the institution by launching a faculty survey and conducting a number of interviews with researchers.

METHODS

We submitted a survey on research data management practices to a sample of faculty including questions about data production, documentation, management, and sharing practices. The survey was coupled with in-depth interviews with a subset of faculty. We also make estimates of the amount of research data produced by faculty.

RESULTS

Survey and interview results suggest moderate level of awareness of the regulatory environment around research data management. Results also present a clear picture of the types and quantities of data being produced at CMU and how these differ among research domains. Researchers identified a number of services that they would find valuable including assistance with data management planning and backup/storage services. We attempt to estimate the amount of data produced and shared by researchers at CMU.

DISCUSSION

Results suggest that researchers may need and are amenable to assistance with research data management. Our estimates of the amount of data produced and shared have implications for decisions about data storage and preservation.

CONCLUSION

Our survey and interview results have offered significant guidance for building a suite of services for our institution.

URL : Assessing Research Data Management Practices of Faculty at Carnegie Mellon University

DOI : http://doi.org/10.7710/2162-3309.1258