Semantic representation and enrichment of information retrieval experimental data

Authors : Gianmaria Silvello, Georgeta Bordea, Nicola Ferro, Paul Buitelaar, Toine Bogers

Experimental evaluation carried out in international large-scale campaigns is a fundamental pillar of the scientific and technological advancement of information retrieval (IR) systems.

Such evaluation activities produce a large quantity of scientific and experimental data, which are the foundation for all the subsequent scientific production and development of new systems.

In this work, we discuss how to semantically annotate and interlink this data, with the goal of enhancing their interpretation, sharing, and reuse. We discuss the underlying evaluation workflow and propose a resource description framework model for those workflow parts.

We use expertise retrieval as a case study to demonstrate the benefits of our semantic representation approach. We employ this model as a means for exposing experimental data as linked open data (LOD) on the Web and as a basis for enriching and automatically connecting this data with expertise topics and expert profiles.

In this context, a topic-centric approach for expert search is proposed, addressing the extraction of expertise topics, their semantic grounding with the LOD cloud, and their connection to IR experimental data.

Several methods for expert profiling and expert finding are analysed and evaluated. Our results show that it is possible to construct expert profiles starting from automatically extracted expertise topics and that topic-centric approaches outperform state-of-the-art language modelling approaches for expert finding.

URL : https://aran.library.nuigalway.ie/handle/10379/5862

Advancing research data publishing practices for the social sciences: from archive activity to empowering researchers

Authors : Veerle Van den Eynden, Louise Corti

Sharing and publishing social science research data have a long history in the UK, through long-standing agreements with government agencies for sharing survey data and the data policy, infrastructure, and data services supported by the Economic and Social Research Council.

The UK Data Service and its predecessors developed data management, documentation, and publishing procedures and protocols that stand today as robust templates for data publishing.

As the ESRC research data policy requires grant holders to submit their research data to the UK Data Service after a grant ends, setting standards and promoting them has been essential in raising the quality of the resulting research data being published. In the past, received data were all processed, documented, and published for reuse in-house.

Recent investments have focused on guiding and training researchers in good data management practices and skills for creating shareable data, as well as a self-publishing repository system, ReShare. ReShare also receives data sets described in published data papers and achieves scientific quality assurance through peer review of submitted data sets before publication.

Social science data are reused for research, to inform policy, in teaching and for methods learning. Over a 10 years period, responsive developments in system workflows, access control options, persistent identifiers, templates, and checks, together with targeted guidance for researchers, have helped raise the standard of self-publishing social science data.

Lessons learned and developments in shifting publishing social science data from an archivist responsibility to a researcher process are showcased, as inspiration for institutions setting up a data repository.

URL : Advancing research data publishing practices for the social sciences: from archive activity to empowering researchers

DOI : doi:10.1007/s00799-016-0177-3

Open data, [open] access: linking data sharing and article sharing in the Earth Sciences

Author : Samantha Teplitzky

Introduction

The norms of a research community influence practice, and norms of openness and sharing can be shaped to encourage researchers who share in one aspect of their research cycle to share in another.

Different sets of mandates have evolved to require that research data be made public, but not necessarily articles resulting from that collected data. In this paper, I ask to what extent publications in the Earth Sciences are more likely to be open access (in all of its definitions) when researchers open their data through the Pangaea repository.

Methods

Citations from Pangaea data sets were studied to determine the level of open access for each article.

Results

This study finds that the proportion of gold open access articles linked to the repository increased 25% from 2010 to 2015 and 75% of articles were available from multiple open sources.

Discussion

The context for increased preference for gold open access is considered and future work linking researchers’ decisions to open their work to the adoption of open access mandates is proposed.

URL : Open data, [open] access: linking data sharing and article sharing in the Earth Sciences

DOI : http://doi.org/10.7710/2162-3309.2150

What incentives increase data sharing in health and medical research? A systematic review

Authors : Anisa Rowhani-Farid, Michelle Allen, Adrian G. Barnett

Background

The foundation of health and medical research is data. Data sharing facilitates the progress of research and strengthens science. Data sharing in research is widely discussed in the literature; however, there are seemingly no evidence-based incentives that promote data sharing.

Methods

A systematic review (registration: doi.org/10.17605/OSF.IO/6PZ5E) of the health and medical research literature was used to uncover any evidence-based incentives, with pre- and post-empirical data that examined data sharing rates.

We were also interested in quantifying and classifying the number of opinion pieces on the importance of incentives, the number observational studies that analysed data sharing rates and practices, and strategies aimed at increasing data sharing rates.

Results

Only one incentive (using open data badges) has been tested in health and medical research that examined data sharing rates. The number of opinion pieces (n = 85) out-weighed the number of article-testing strategies (n = 76), and the number of observational studies exceeded them both (n = 106).

Conclusions

Given that data is the foundation of evidence-based health and medical research, it is paradoxical that there is only one evidence-based incentive to promote data sharing. More well-designed studies are needed in order to increase the currently low rates of data sharing.

URL : What incentives increase data sharing in health and medical research? A systematic review

Alternative location : http://researchintegrityjournal.biomedcentral.com/articles/10.1186/s41073-017-0028-9

Data Management: New Tools, New Organization, and New Skills in a French Research Institute

Authors : Caroline Martin, Colette Cadiou, Emmanuelle Jannès-Ober

In the context of E-science and open access, visibility and impact of scientific results and data have become important aspects for spreading information to users and to the society in general.

The objective of this general trend of the economy is to feed the innovation process and create economic value. In our institute, the French National Research Institute of Science and Technology for Environment and Agriculture, Irstea, the department in charge of scientific and technical information, with the help of other professionals (Scientists, IT professionals, ethics advisors…), has recently developed suitable services for the researchers and for their needs concerning the data management in order to answer European recommendations for open data.

This situation has demanded to review the different workflows between databases, to question the organizational aspects between skills, occupations, and departments in the institute.

In fact, the data management involves all professionals and researchers to asset their working ways together.

URL : Data Management: New Tools, New Organization, and New Skills in a French Research Institute

DOI : http://doi.org/10.18352/lq.10196

Perseids: Experimenting with Infrastructure for Creating and Sharing Research Data in the Digital Humanities

Author : Bridget Almas

The Perseids project provides a platform for creating, publishing, and sharing research data, in the form of textual transcriptions, annotations and analyses. An offshoot and collaborator of the Perseus Digital Library (PDL),

Perseids is also an experiment in reusing and extending existing infrastructure, tools, and services.

This paper discusses infrastructure in the domain of digital humanities (DH). It outlines some general approaches to facilitating data sharing in this domain, and the specific choices we made in developing Perseids to serve that goal.

It concludes by identifying lessons we have learned about sustainability in the process of building Perseids, noting some critical gaps in infrastructure for the digital humanities, and suggesting some implications for the wider community.

URL : Perseids: Experimenting with Infrastructure for Creating and Sharing Research Data in the Digital Humanities

DOI : http://doi.org/10.5334/dsj-2017-019

What Constitutes Peer Review of Data: A survey of published peer review guidelines

Author : Todd A Carpenter

Since a number of journals specifically focus on the review and publication of data sets, reviewing their policies seems an appropriate place to start in assessing what existing practice looks like in the ‘real world’ of reviewing and publishing data.

This article outlines a study of the publicly available peer review policies of 39 scientific publications that publish data papers to discern which criteria are most and least frequently referenced. It also compares current practice with proposed criteria published in 2012.

URL : What Constitutes Peer Review of Data: A survey of published peer review guidelines

Alternative location : https://arxiv.org/abs/1704.02236