Sharing data increases citations

Authors: Thea Marie Drachen, Ole Ellegaard, Asger Væring Larsen, Søren Bertil Fabricius Dorch

This paper presents some indications to the existence of a citation advantage related to sharing data using astrophysics as a case. Through bibliometric analyses we find a citation advantage for astrophysical papers in core journals.

The advantage arises as indexed papers are associated with data by bibliographical links, and consists of papers receiving on average significantly more citations per paper per year, than do papers not associated with links to data.


Discovery and Reuse of Open Datasets: An Exploratory Study

Authors : Sara Mannheimer, Leila Belle Sterman, Susan Borda


This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories.


Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric.

The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description.


Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories.

Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers.

The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates.


The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets.

Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

URL : Discovery and Reuse of Open Datasets: An Exploratory Study


The Journal Article as a Means to Share Data: a Content Analysis of Supplementary Materials from Two Disciplines

Authors : Jeremy Kenyon, Nancy Sprague, Edward Flathers


The practice of publishing supplementary materials with journal articles is becoming increasingly prevalent across the sciences.

We sought to understand better the content of these materials by investigating the differences between the supplementary materials published by authors in the geosciences and plant sciences.


We conducted a random stratified sampling of four articles from each of 30 journals published in 2013. In total, we examined 297 supplementary data files for a range of different factors.


We identified many similarities between the practices of authors in the two fields, including the formats used (Word documents, Excel spreadsheets, PDFs) and the small size of the files.

There were differences identified in the content of the supplementary materials: the geology materials contained more maps and machine-readable data; the plant science materials included much more tabular data and multimedia content.


Our results suggest that the data shared through supplementary files in these fields may not lend itself to reuse. Code and related scripts are not often shared, nor is much ‘raw’ data. Instead, the files often contain summary data, modified for human reading and use.


Given these and other differences, our results suggest implications for publishers, librarians, and authors, and may require shifts in behavior if effective data sharing is to be realized.

URL : The Journal Article as a Means to Share Data: a Content Analysis of Supplementary Materials from Two Disciplines


Research data management in social sciences and humanities: A survey at the University of Lille (France)

Authors : Joachim Schöpfel, Hélène Prost

The paper presents results from a campus-wide survey at the University of Lille (France) on research data management in social sciences and humanities.

The survey received 270 responses, equivalent to 15% of the whole sample of scientists, scholars, PhD students, administrative and technical staff (research management, technical support services); all disciplines were represented.

The responses show a wide variety of practice and usage. The results are discussed regarding job status and disciplines and compared to other surveys. Four groups can be distinguished, i.e. pioneers (20-25%), motivated (25-30%), unaware (30%) and reluctant (5-10%).

Finally, the next steps to improve the research data management on the campus are presented.

URL : Research data management in social sciences and humanities: A survey at the University of Lille (France)

Alternative location :

New Horizons for a Data-Driven Economy : A Roadmap for Usage and Exploitation of Big Data in Europe

Editors : José María Cavanillas, Edward Curry, Wolfgang Wahlster

In this book readers will find technological discussions on the existing and emerging technologies across the different stages of the big data value chain. They will learn about legal aspects of big data, the social impact, and about education needs and requirements.

And they will discover the business perspective and how big data technology can be exploited to deliver value within different sectors of the economy.

URL : New Horizons for a Data-Driven Economy : A Roadmap for Usage and Exploitation of Big Data in Europe

Alternative location :


Research Data in Current Research Information Systems

Authors : Joachim Schöpfel, Hélène Prost,Violaine Rebouillat

The paper provides an overview of recent research and publications on the integration of research data in Current Research Information Systems (CRIS) and addresses three related issues, i.e. the object of evaluation, identifier schemes and conservation.

Our focus is on social sciences and humanities. As research data gradually become a crucial topic of scientific communication and evaluation, current research information systems must be able to consider and manage the great variety and granularity levels of data as sources and results of scientific research.

More empirical and moreover conceptual work is needed to increase our understanding of the reality of research data and the way they can and should be used for the needs and objectives of research evaluation.

The paper contributes to the debate on the evaluation of research data, especially in the environment of open science and open data, and will be helpful in implementing CRIS and research data policies.


Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources

Authors : Andra Waagmeester,  Martina Kutmon, Anders Riutta, Ryan Miller,  Egon L. Willighagen, Chris T.  Evelo , Alexander R. Pico

The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data.

The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at

Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries.

In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web.

WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API ( to be used in various tools for drug development.

We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web.

URL : Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources