Dataset search: a survey

Authors : Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez, Emilia Kacprzak, Paul Groth

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities.

Google recently beta-released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets.

Here, we survey the state of the art of research and commercial systems and discuss what makes dataset search a field in its own right, with unique challenges and open questions.

We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to tackle these questions as well as immediate next steps that will take the field forward.

URL : Dataset search: a survey

DOI : https://doi.org/10.1007/s00778-019-00564-x

A Crisis in “Open Access”: Should Communication Scholarly Outputs Take 77 Years to Become Open Access?

Authors : Abbas Ghanbari Baghestan, Hadi Khaniki, Abdolhosein Kalantari, Mehrnoosh Akhtari-Zavare, Elaheh Farahmand, Ezhar Tamam, Nader Ale Ebrahim, Havva Sabani, Mahmoud Danaee

This study diachronically investigates the trend of the “open access” in the Web of Science (WoS) category of “communication.” To evaluate the trend, data were collected from 184 categories of WoS from 1980 to 2017.

A total of 87,997,893 documents were obtained, of which 95,304 (0.10%) were in the category of “communication.” In average, 4.24% of the documents in all 184 categories were open access. While in communication, it was 3.29%, which ranked communication 116 out of 184.

An Open Access Index (OAI) was developed to predict the trend of open access in communication. Based on the OAI, communication needs 77 years to fully reach open access, which undeniably can be considered as “crisis in scientific publishing” in this field.

Given this stunning information, it is the time for a global call for “open access” by communication scholars across the world. Future research should investigate whether the current business models of publications in communication scholarships are encouraging open access or pose unnecessary restrictions on knowledge development.

URL : A Crisis in “Open Access”: Should Communication Scholarly Outputs Take 77 Years to Become Open Access?

DOI : https://doi.org/10.1177/2158244019871044

Evolution of an Institutional Repository: A Case History from Nebraska

Author : Paul Royster

The 13-year history of the institutional repository (IR) at the University of Nebraska–Lincoln is recounted with emphasis on local conditions, administrative support, recruitment practices, and management philosophy.

Practices included offering new services, hosting materials outside the conventional tenure stream, using student employees, and providing user analytics on global dissemination. Acquiring trust of faculty depositors enhanced recruitment and extra-library support.

Evolution of policies on open access, copyright, metadata, and third-party vendors are discussed, with statistics illustrating the growth, contents, and outreach of the repository over time.

A final section discusses future directions for scholarly communications and IRs in particular.

URL : https://digitalcommons.unl.edu/libraryscience/382/

Open science and modified funding lotteries can impede the natural selection of bad science

Authors : Paul E. Smaldino, Matthew A. Turner, Pablo A. Contreras Kallens

Assessing scientists using exploitable metrics can lead to the degradation of research methods even without any strategic behaviour on the part of individuals, via ‘the natural selection of bad science.’

Institutional incentives to maximize metrics like publication quantity and impact drive this dynamic. Removing these incentives is necessary, but institutional change is slow.

However, recent developments suggest possible solutions with more rapid onsets. These include what we call open science improvements, which can reduce publication bias and improve the efficacy of peer review. In addition, there have been increasing calls for funders to move away from prestige- or innovation-based approaches in favour of lotteries.

We investigated whether such changes are likely to improve the reproducibility of science even in the presence of persistent incentives for publication quantity through computational modelling.

We found that modified lotteries, which allocate funding randomly among proposals that pass a threshold for methodological rigour, effectively reduce the rate of false discoveries, particularly when paired with open science improvements that increase the publication of negative results and improve the quality of peer review.

In the absence of funding that targets rigour, open science improvements can still reduce false discoveries in the published literature but are less likely to improve the overall culture of research practices that underlie those publications.

URL : Open science and modified funding lotteries can impede the natural selection of bad science

DOI : https://doi.org/10.1098/rsos.190194

Assessing the Quality of Scientific Papers

Authors : Roman Vainshtein, Gilad Katz, Bracha Shapira, Lior Rokach

A multitude of factors are responsible for the overall quality of scientific papers, including readability, linguistic quality, fluency,semantic complexity, and of course domain-specific technical factors.

These factors vary from one field of study to another. In this paper, we propose a measure and method for assessing the overall quality of the scientific papers in a particular field of study.

We evaluate our method in the computer science domain, but it can be applied to other technical and scientific fields.Our method is based on the corpus linguistics technique. This technique enables the extraction of required information and knowledge associated with a specific domain.

For this purpose, we have created a large corpus, consisting of papers from very high impact conferences. First, we analyze this corpus in order to extract rich domain-specific terminology and knowledge.

Then we use the acquired knowledge to estimate the quality of scientific papers by applying our proposed measure. We examine our measure on high and low scientific impact test corpora.

Our results show a significant difference in the measure scores of the high and low impact test corpora. Second, we develop a classifier based on our proposed measure and compare it to the baseline classifier.

Our results show that the classifier based on our measure over-performed the baseline classifier. Based on the presented results the proposed measure and the technique can be used for automated assessment of scientific papers.

URL : https://arxiv.org/abs/1908.04200

Raising Visibility in the Digital Humanities Landscape: Academic Engagement and the Question of the Library’s Role

Authors : Kathleen Kasten-Mutkus, Laura Costello, Darren Chase

Academic libraries have an important role to play in supporting digital humanities projects in their communities. Librarians at Stony Brook University Libraries host Open Mic events for digital humanities researchers, teachers, and students on campus.

 Inspired by a desire to better serve digital humanists with existing projects, this event was initially organized to increase the visibility of scholars and students with nascent projects and connect these digital humanists to library supported resources and to one another.

For the Libraries, the Open Mic was an opportunity to understand the scope and practices of the digital humanities community at Stony Brook, and to identify ways to make meaningful interventions.

An open mic is a uniquely suitable event format in that it embodies a dynamic, permissive, multidisciplinary presentation space that is as much for exercising new and ongoing research (and technologies) as it is for making discoveries and connections.

The success of these events can be measured in the establishment of the University Libraries as a nexus for digital humanities work, consultations, instruction, workshops, and community on a campus without a designated digital humanities center.

The digital humanities Open Mic event at Stony Brook University locates the digital humanities within the library’s repertoire, while signaling that the library is — in a number of essential ways — open.

URL : http://www.digitalhumanities.org/dhq/vol/13/2/000420/000420.html

On a Quest for Cultural Change – Surveying Research Data Management Practices at Delft University of Technology

Authors : Heather Andrews Mancilla, Marta Teperek, Jasper van Dijck, Kees den Heijer, Robbert Eggermont, Esther Plomp, Yasemin Turkyilmaz-van der Velden, Shalini Kurapati

The Data Stewardship project is a new initiative from the Delft University of Technology (TU Delft) in the Netherlands. Its aim is to create mature working practices and policies regarding research data management across all TU Delft faculties.

The novelty of this project relies on having a dedicated person, the so-called ‘Data Steward’, embedded in each faculty to approach research data management from a more discipline-specific perspective. It is within this framework that a research data management survey was carried out at the faculties that had a Data Steward in place by July 2018.

The goal was to get an overview of the general data management practices, and use its results as a benchmark for the project. The total response rate was 11 to 37% depending on the faculty.

Overall, the results show similar trends in all faculties, and indicate lack of awareness regarding different data management topics such as automatic data backups, data ownership, relevance of data management plans, awareness of FAIR data principles and usage of research data repositories.

The results also show great interest towards data management, as more than ~80% of the respondents in each faculty claimed to be interested in data management training and wished to see the summary of survey results.

Thus, the survey helped identified the topics the Data Stewardship project is currently focusing on, by carrying out awareness campaigns and providing training at both university and faculty levels.

URL : On a Quest for Cultural Change – Surveying Research Data Management Practices at Delft University of Technology