Knowledge discovery through text-based similarity searches for astronomy literature

AuthorWolfgang Kerzendorf

The increase in the number of researchers coupled with the ease of publishing and distribution of scientific papers (due to technological advancements) has resulted in a dramatic increase in astronomy literature.

This has likely led to the predicament that the body of the literature is too large for traditional human consumption and that related and crucial knowledge is not discovered by researchers. In addition to the increased production of astronomical literature, recent decades have also brought several advancements in computer linguistics.

Especially, the machine aided processing of literature dissemination might make it possible to convert this stream of papers into a coherent knowledge set. In this paper, we present the application of computer linguistics techniques on astronomy literature.

In particular, we developed a tool that will find similar articles purely based on text content given an input paper.

We find that our technique performs robustly in comparison with other tools recommending articles given a reference papers (known as recommender system). Our novel tool shows the great power in combining computer linguistics with astronomy literature and suggests that additional research in this endeavor will likely produce even better tools that will help researchers cope with the vast amounts of knowledge being produced.

URL : https://arxiv.org/abs/1705.05840

Is there agreement on the prestige of scholarly book publishers in the Humanities? DELPHI over survey results

Authors : Elea Giménez-Toledo, Jorge Mañana-Rodríguez

Despite having an important role supporting assessment processes, criticism towards evaluation systems and the categorizations used are frequent. Considering the acceptance by the scientific community as an essential issue for using rankings or categorizations in research evaluation, the aim of this paper is testing the results of rankings of scholarly book publishers’ prestige, Scholarly Publishers Indicators (SPI hereafter).

SPI is a public, survey-based ranking of scholarly publishers’ prestige (among other indicators). The latest version of the ranking (2014) was based on an expert consultation with a large number of respondents.

In order to validate and refine the results for Humanities’ fields as proposed by the assessment agencies, a Delphi technique was applied with a panel of randomly selected experts over the initial rankings.

The results show an equalizing effect of the technique over the initial rankings as well as a high degree of concordance between its theoretical aim (consensus among experts) and its empirical results (summarized with Gini Index).

The resulting categorization is understood as more conclusive and susceptible of being accepted by those under evaluation.

URL : https://arxiv.org/abs/1705.04517

Opening Up Communication: Assessing Open Access Practices in the Communication Studies Discipline

Author : Teresa Auch Schultz

Introduction

Open access (OA) citation effect studies have looked at a number of disciplines but not yet the field of communication studies. This study researched how communication studies fare with the open access citation effect, as well as whether researchers follow their journal deposit policies.

Method

The study tracked 920 articles published in 2011 and 2012 from 10 journals and then searched for citations and an OA version using the program Publish or Perish. Deposit policies of each of the journals were gathered from SHERPA/RoMEO and used to evaluate OA versions.

Results

From the sample, 42 percent had OA versions available. Of those OA articles, 363 appeared to violate publisher deposit policies by depositing the version of record, but the study failed to identify post-print versions for 87 percent of the total sample for the journals that allowed it.

All articles with an OA version had a median of 17 citations, compared to only nine citations for non-OA articles.

Discussion & Conclusion

The citation averages, which are statistically significant, show a positive correlation between OA and the number of citations.

The study also shows communication studies researchers are taking part in open access but perhaps without the full understanding of their publisher’s policies.

URL : Opening Up Communication: Assessing Open Access Practices in the Communication Studies Discipline

DOI : http://doi.org/10.7710/2162-3309.2131

Social Science Data Repositories in Data Deluge: A Case Study at ICPSR Workflow and Practices

Authors :  Wei Jeng, Daqing He, Yu Chi

Design/methodology/approach

We conducted two focus group sessions and one individual interview with eight employees at the world’s largest social science data repository, the Interuniversity Consortium for Political and Social Research (ICPSR).

By examining their current actions (activities regarding their work responsibilities) and IT practices, we studied the barriers and challenges of archiving and curating qualitative data at ICPSR.

Purpose

Due to the recent surge of interest in the age of the data deluge, the importance of researching data infrastructures is increasing. The Open Archival Information System (OAIS) model has been widely adopted as a framework for creating and maintaining digital repositories.

Considering that OAIS is a reference model that requires customization for actual practice, this study examines how the current practices in a data repository map to the OAIS environment and functional components.

Findings

We observed that the OAIS model is robust and reliable in actual service processes for data curation and data archives. In addition, a data repository’s workflow resembles digital archives or even digital libraries.

On the other hand, we find that: 1) the cost of preventing disclosure risk and 2) a lack of agreement on the standards of text data files are the most apparent obstacles for data curation professionals to handle qualitative data; 3) the maturation of data metrics seems to be a promising solution to several challenges in social science data sharing.

Original value

We evaluated the gap between a research data repository’s current practices and the adoption of the OAIS model. We also identified answers to questions such as how current technological infrastructure in a leading data repository such as ICPSR supports their daily operations, what the ideal technologies in those data repositories would be, and the associated challenges that accompany these ideal technologies.

Most importantly, we helped to prioritize challenges and barriers from the data curator’s perspective, and contribute implications of data sharing and reuse in social sciences.

URL : http://d-scholarship.pitt.edu/31876/

Current Status of Chinese Open Access Institutional Repositories: A Case Study

Authors : K. C. Das, Kunwar Singh

The present study mainly focuses on the current status of Chinese Open Access Institutional Repositories: A Case Study.The present study attempts to determine the current status of open access institutional repositories in China based on the four key constraints, i.e. number of IRs, types, subjects and contents and software used.

To fulfill the specified objectives, the Open access institutional repositories in China were identified by selecting the database of Directory of Open Access Repositories (Open DOAR) and the data were collected analysed for the necessary information.

The study highlights the current status of open access institutional repositories in China and its contribution to a global knowledge base.

URL : Current Status of Chinese Open Access Institutional Repositories: A Case Study

Semantic representation and enrichment of information retrieval experimental data

Authors : Gianmaria Silvello, Georgeta Bordea, Nicola Ferro, Paul Buitelaar, Toine Bogers

Experimental evaluation carried out in international large-scale campaigns is a fundamental pillar of the scientific and technological advancement of information retrieval (IR) systems.

Such evaluation activities produce a large quantity of scientific and experimental data, which are the foundation for all the subsequent scientific production and development of new systems.

In this work, we discuss how to semantically annotate and interlink this data, with the goal of enhancing their interpretation, sharing, and reuse. We discuss the underlying evaluation workflow and propose a resource description framework model for those workflow parts.

We use expertise retrieval as a case study to demonstrate the benefits of our semantic representation approach. We employ this model as a means for exposing experimental data as linked open data (LOD) on the Web and as a basis for enriching and automatically connecting this data with expertise topics and expert profiles.

In this context, a topic-centric approach for expert search is proposed, addressing the extraction of expertise topics, their semantic grounding with the LOD cloud, and their connection to IR experimental data.

Several methods for expert profiling and expert finding are analysed and evaluated. Our results show that it is possible to construct expert profiles starting from automatically extracted expertise topics and that topic-centric approaches outperform state-of-the-art language modelling approaches for expert finding.

URL : https://aran.library.nuigalway.ie/handle/10379/5862

Experiences in integrated data and research object publishing using GigaDB

Authors : Scott C Edmunds, Peter Li, Christopher I Hunter, Si Zhe Xiao, Robert L Davidson, Nicole Nogoy, Laurie Goodman

In the era of computation and data-driven research, traditional methods of disseminating research are no longer fit-for-purpose. New approaches for disseminating data, methods and results are required to maximize knowledge discovery.

The “long tail” of small, unstructured datasets is well catered for by a number of general-purpose repositories, but there has been less support for “big data”. Outlined here are our experiences in attempting to tackle the gaps in publishing large-scale, computationally intensive research.

GigaScience is an open-access, open-data journal aiming to revolutionize large-scale biological data dissemination, organization and re-use. Through use of the data handling infrastructure of the genomics centre BGI, GigaScience links standard manuscript publication with an integrated database (GigaDB) that hosts all associated data, and provides additional data analysis tools and computing resources.

Furthermore, the supporting workflows and methods are also integrated to make published articles more transparent and open. GigaDB has released many new and previously unpublished datasets and data types, including as urgently needed data to tackle infectious disease outbreaks, cancer and the growing food crisis.

Other “executable” research objects, such as workflows, virtual machines and software from several GigaScience articles have been archived and shared in reproducible, transparent and usable formats.

With data citation producing evidence of, and credit for, its use in the wider research community, GigaScience demonstrates a move towards more executable publications. Here data analyses can be reproduced and built upon by users without coding backgrounds or heavy computational infrastructure in a more democratized manner.

URL : Experiences in integrated data and research object publishing using GigaDB

DOI : http://link.springer.com/article/10.1007/s00799-016-0174-6