Wikipedia Text Reuse: Within and Without

Authors : Milad Alshomary, Michael Völske, Tristan Licht, Henning Wachsmuth, Benno Stein, Matthias Hagen, Martin Potthast

We study text reuse related to Wikipedia at scale by compiling the first corpus of text reuse cases within Wikipedia as well as without (i.e., reuse of Wikipedia text in a sample of the Common Crawl).

To discover reuse beyond verbatim copy and paste, we employ state-of-the-art text reuse detection technology, scaling it for the first time to process the entire Wikipedia as part of a distributed retrieval pipeline.

We further report on a pilot analysis of the 100 million reuse cases inside, and the 1.6 million reuse cases outside Wikipedia that we discovered. Text reuse inside Wikipedia gives rise to new tasks such as article template induction, fixing quality flaws due to inconsistencies arising from asynchronous editing of reused passages, or complementing Wikipedia’s ontology.

Text reuse outside Wikipedia yields a tangible metric for the emerging field of quantifying Wikipedia’s influence on the web. To foster future research into these tasks, and for reproducibility’s sake, the Wikipedia text reuse corpus and the retrieval pipeline are made freely available.


Peer Review of Reviewers: The Author’s Perspective

Authors : Ivana Drvenica, Giangiacomo Bravo, Lucija Vejmelka, Aleksandar Dekanski, Olgica Nedić

The aim of this study was to investigate the opinion of authors on the overall quality and effectiveness of reviewers’ contributions to reviewed papers. We employed an on-line survey of thirteen journals which publish articles in the field of life, social or technological sciences.

Responses received from 193 authors were analysed using a mixed-effects model in order to determine factors deemed the most important in the authors’ evaluation of the reviewers. Qualitative content analysis of the responses to open questions was performed as well.

The mixed-effects model revealed that the authors’ assessment of the competence of referees strongly depended on the final editorial decision and that the speed of the review process was influential as well.

In Ordinary Least Squares (OLS) analysis on seven questions detailing authors’ opinions, perception of review speed remained a significant predictor of the assessment. In addition, both the perceived competence and helpfulness of the reviewers significantly and positively affected the authors’ evaluation.

New models were used to re-check the value of these two factors and it was confirmed that the assessment of the competence of reviewers strongly depended on the final editorial decision.

URL : Peer Review of Reviewers: The Author’s Perspective

Alternative location :

Des ebooks dans sa poche : projet de valorisation de la collection numérique de la Bibliothèque de l’UNIGE

Auteurs/Authors : Pablo Iriarte, Aurélie Vieux, Marc Meury

La valorisation des ressources en ligne, coûteuses et invisibles dans les rayons des bibliothèques, se fait souvent manuellement avec un grand nombre d’étapes chronophages nécessitant des compétences techniques.

En 2017, la Bibliothèque de l’Université de Genève a mis sur pied un groupe de travail dont l’objectif est d’harmoniser les pratiques de promotion de leurs collections numériques, notamment les ebooks.

Ce projet a abouti à la création de l’Application de valorisation numérique “Avalon”, qui simplifie le processus de création des supports de valorisation (collecte de métadonnées et d’images de couverture, création des URLs raccourcis et QR-codes) tout en respectant la charte graphique institutionnelle.

L’accès aux ebooks est simplifié grâce à la lecture des QR-codes, fonctionnalité intégrée à l’application UNIGE mobile, et l’affichage des informations sur une page Web intermédiaire. L’usager peut ainsi littéralement “mettre un ebook dans sa poche”.

Cet article a pour objectif de présenter le contexte du projet, la méthodologie employée, le fonctionnement d’Avalon et de proposer un retour d’expérience sur ce projet.


Integrating Data Science Tools into a Graduate Level Data Management Course

Authors: Pete E. Pascuzzi, Megan R. Sapp Nelson


This paper describes a project to revise an existing research data management (RDM) course to include instruction in computer skills with robust data science tools.


A Carnegie R1 university.

Brief Description

Graduate student researchers need training in the basic concepts of RDM. However, they generally lack experience with robust data science tools to implement these concepts holistically. Two library instructors fundamentally redesigned an existing research RDM course to include instruction with such tools.

The course was divided into lecture and lab sections to facilitate the increased instructional burden. Learning objectives and assessments were designed at a higher order to allow students to demonstrate that they not only understood course concepts but could use their computer skills to implement these concepts.


Twelve students completed the first iteration of the course. Feedback from these students was very positive, and they appreciated the combination of theoretical concepts, computer skills and hands-on activities. Based on student feedback, future iterations of the course will include more “flipped” content including video lectures and interactive computer tutorials to maximize active learning time in both lecture and lab.

The substance of this article is based upon poster presentations at RDAP Summit 2018.

URL : Integrating Data Science Tools into a Graduate Level Data Management Course


Toward a Better Data Management Plan: The Impact of DMPs on Grant Funded Research Practices

Author : Sara Mannheimer

Data Management Plans (DMPs) are often required for grant applications. But do strong DMPs lead to better data management and sharing practices? Several recent research projects in the Library and Information Science field have investigated data management planning and practice through DMP content analysis and data-management-related interviews.

However, research hasn’t yet shown how DMPs ultimately affect data management and data sharing practices during grant-funded research. The research described in this article contributes to the existing literature by examining the impact of DMPs on grant awards and on Principal Investigators’ (PIs) data management and sharing practices.

The results of this research suggest the following key takeaways:

(1) Most PIs practice internal data management in order to prevent data loss, to facilitate sharing within the research team, and to seamlessly continue their research during personnel turnover;

(2) PIs still have room to grow in understanding specialized concepts such as metadata and policies for use and reuse;

(3) PIs may need guidance on practices that facilitate FAIR data, such as using metadata standards, assigning licenses to their data, and publishing in data repositories.

Ultimately, the results of this research can inform academic library services and support stronger, more actionable DMPs. The substance of this article is based upon a lightning talk presentation at RDAP Summit 2018.

URL : Toward a Better Data Management Plan: The Impact of DMPs on Grant Funded Research Practices


Open Access Information Service for Researchers in Theology

Author : Marianne Dörr

Tübingen University Library offers a continuously improved next generation bibliographic database for theology and religious studies. The “Index theologicus” database is available worldwide in open access.

It is funded by the Deutsche Forschungsgemeinschaft (German Research Foundation) in the funding program “specialised information services”. This paper informs about the background of the project and the steps the Library took in order to transform a legacy online content database system into one of the most important international bibliographies in theology without increasing the number of staff involved.

URL : Open Access Information Service for Researchers in Theology


Research collaboration and productivity: is there correlation?

Authors : Giovanni Abramo, Ciriaco Andrea D’Angelo, Flavia Di Costa

The incidence of extramural collaboration in academic research activities is increasing as a result of various factors. These factors include policy measures aimed at fostering partnership and networking among the various components of the research system, policies which are in turn justified by the idea that knowledge sharing could increase the effectiveness of the system.

Over the last two decades, the scientific community has also stepped up activities to assess the actual impact of collaboration intensity on the performance of research systems.

This study draws on a number of empirical analyses, with the intention of measuring the effects of extramural collaboration on research performance and, indirectly, verifying the legitimacy of policies that support this type of collaboration.

The analysis focuses on the Italian academic research system. The aim of the work is to assess the level of correlation, at institutional level, between scientific productivity and collaboration intensity as a whole, both internationally and with private organizations.

This will be carried out using a bibliometric type of approach, which equates collaboration with the co-authorship of scientific publications.