Science Concierge: A Fast Content-Based Recommendation System for Scientific Publications

Authors : Titipat Achakulvisut, Daniel E. Acuna, Tulakan Ruangrong , Konrad Kording

Finding relevant publications is important for scientists who have to cope with exponentially increasing numbers of scholarly material. Algorithms can help with this task as they help for music, movie, and product recommendations.

However, we know little about the performance of these algorithms with scholarly material. Here, we develop an algorithm, and an accompanying Python library, that implements a recommendation system based on the content of articles.

Design principles are to adapt to new content, provide near-real time suggestions, and be open source. We tested the library on 15K posters from the Society of Neuroscience Conference 2015.

Human curated topics are used to cross validate parameters in the algorithm and produce a similarity metric that maximally correlates with human judgments. We show that our algorithm significantly outperformed suggestions based on keywords.

The work presented here promises to make the exploration of scholarly material faster and more accurate.

URL : Science Concierge: A Fast Content-Based Recommendation System for Scientific Publications

DOI : http://dx.doi.org/10.1371/journal.pone.0158423

Men set their own cites high: Gender and self-citation across fields and over time

Authors : Molly M. King, Carl T. Bergstrom, Shelley J. Correll, Jennifer Jacquet, Jevin D. West

How common is self-citation in scholarly publication and does the practice vary by gender? Using novel methods and a dataset of 1.5 million research papers in the scholarly database JSTOR published between 1779-2011, we find that nearly 10% of references are self-citations by a paper’s authors.

We further find that over the years between 1779-2011, men cite their own papers 56% more than women do. In the last two decades of our data, men self-cite 70% more than women. Women are also more than ten percentage points more likely than men to not cite their own previous work at all.

Despite increased representation of women in academia, this gender gap in self-citation rates has remained stable over the last 50 years. We break down self-citation patterns by academic field and number of authors, and comment on potential mechanisms behind these observations.

These findings have important implications for scholarly visibility and likely consequences for academic careers.

URL : https://arxiv.org/abs/1607.00376

Open access publishing trend analysis: statistics beyond the perception

Authors : Elisabetta Poltronieri, Elena Bravo, Moreno Curti, Maurizio Ferri, Cristina Mancini

Introduction

The purpose of this analysis was twofold: to track the number of open access journals acquiring impact factor, and to investigate the distribution of subject categories pertaining to these journals. As a case study, journals in which the researchers of the National Institute of Health (Istituto Superiore di Sanità) in Italy have published were surveyed.

Method

Data were collected by searching open access journals listed in the Directory of Open Access Journals ) then compared with those having an impact factor as tracked by the Journal Citation Reports for the years 2010-2012. Journal Citation Reports subject categories were matched with Medical Subject Headings to provide a larger content classification.

Analysis

A survey was performed to determine the Directory journals matching the Journal Citation Reports list, and their inclusion in a given subject area.

Results

In the years 2010-2012, an increase in the number of journals was observed for Journal Citation Reports (+ 4.93%) and for the Directory (+18.51%). The discipline showing the highest increment was medicine (315 occurrences, 26%).

Conclusions

From 2010 to 2012, the number of open access journals with impact factor has gradually risen, with a prevalence for journals relating to medicine and biological science disciplines, suggesting that authors prefer to publish more than before in open access journals.

URL : http://www.informationr.net/ir/21-2/paper712.html

 

Le catalogue des bibliothèques et ses données à l’heure du web

Auteur/Author : Raphaëlle Lapôtre

Le point de vue de cet article est de décrire la logique du web et du web de données à la lumière des enseignements de Michel Foucault, tels qu’on peut les lire, notamment, dans Les Mots et les Choses (1966).

Dans un premier temps, les données sur le web jouent le rôle que jouait au XVIIe siècle la monnaie : à la fois représentation des richesses, substitution dans le cadre d’échange différés et mesure de la valeur, en l’occurrence, de l’attention que leur attribuent les acteurs du web.

Du point de vue de la gestion de l’attention, deux visions économiques s’affrontent sur le web : l’une, plutôt utilitariste, s’attache à définir la valeur du point de vue de la subjectivité humaine et du besoin, l’autre, plutôt physiocrate, cherche à transformer l’abondance d’information pour la découper et la synthétiser.

Le Web de données quant à lui, reflète ces deux logiques au sein même du langage qui sert à l’exprimer : le RDF reproduit à sa manière l’attribution qui est le principe du lien hypertexte, tandis que les ontologies donnent à lire une classification du monde et des données qui le représentent.

D’une certaine manière, la logique épistémologique des données massives bouleversent quelque peu la logique représentationnelle du web, leur principe fondamental n’étant plus l’analyse ou la critique, mais bien la recherche de corrélation, la mise en parallèle, le commentaire.

URL : https://halshs.archives-ouvertes.fr/halshs-01331753v1

Can Scientific Impact Be Predicted?

Authors : Yuxiao Dong, Reid A. Johnson, Nitesh V. Chawla

A widely used measure of scientific impact is citations. However, due to their heavy-tailed distribution, citations are fundamentally difficult to predict.

Instead, to characterize scientific impact, we address two analogous questions asked by many scientific researchers: « How will my h-index evolve over time, and which of my previously or newly published papers will contribute to it? » To answer these questions, we perform two related tasks. First, we develop a model to predict authors’ future h-indices based on their current scientific impact. Second, we examine the factors that drive papers—either previously or newly published—to increase their authors’ predicted future h-indices.

By leveraging relevant factors, we can predict an author’s h-index in five years with an R2 value of 0.92 and whether a previously (newly) published paper will contribute to this future h-index with an F1 score of 0.99 (0.77).

We find that topical authority and publication venue are crucial to these effective predictions, while topic popularity is surprisingly inconsequential. Further, we develop an online tool that allows users to generate informed h-index predictions.

Our work demonstrates the predictability of scientific impact, and can help scholars to effectively leverage their position of « standing on the shoulders of giants. »

URL : https://arxiv.org/abs/1606.05905

Obstacles to Scholarly Publishing in the Social Sciences and Humanities: A Case Study of Vietnamese Scholars

Authors : Phuong Dzung Pho, Thi Minh Phuong Tran

Publishing scientific research is very important in contributing to the knowledge of a discipline and in sharing research findings among scientists. Based on the quantity and quality of publications, one can evaluate the research capacity of a researcher or the research performance of a university or a country.

However, the number of quality publications in Vietnam is very low in comparison with those in the other countries in the region or in the world, especially in the fields of social sciences and humanities.

Employing both quantitative and qualitative approaches, the current study investigates university lecturers’ attitudes towards research and publication and the obstacles to local and international publication at one of the main universities in social sciences and humanities in Vietnam.

The study found the main barriers to publication are funding and time for research and publication, among many other obstacles. From the analysis of the data, the study would also argue that lecturers’ obstacles to publication may vary across faculties (or disciplines), ages, qualifications, education, research and publication experience.

The findings in this study may be applied to other institutions in Vietnam or in other countries where English is used as a foreign language.

URL : Obstacles to Scholarly Publishing in the Social Sciences and Humanities: A Case Study of Vietnamese Scholars

DOI : http://dx.doi.org/10.3390/publications4030019

Scientific notations for the digital era

Author : Konrad Hinsen

Computers have profoundly changed the way scientific research is done. Whereas the importance of computers as research tools is evident to everyone, the impact of the digital revolution on the representation of scientific knowledge is not yet widely recognized.

An ever increasing part of today’s scientific knowledge is expressed, published, and archived exclusively in the form of software and electronic datasets. In this essay, I compare these digital scientific notations to the the traditional scientific notations that have been used for centuries, showing how the digital notations optimized for computerized processing are often an obstacle to scientific communication and to creative work by human scientists.

I analyze the causes and propose guidelines for the design of more human-friendly digital scientific notations.

URL : https://arxiv.org/abs/1605.02960