Wikipedia Citations: A comprehensive dataset of citations with identifiers extracted from English Wikipedia

Authors : Harshdeep Singh, Robert West, Giovanni Colavizza

Wikipedia’s contents are based on reliable and published sources. To this date, little is known about what sources Wikipedia relies on, in part because extracting citations and identifying cited sources is challenging. To close this gap, we release Wikipedia Citations, a comprehensive dataset of citations extracted from Wikipedia.

A total of 29.3M citations were extracted from 6.1M English Wikipedia articles as of May 2020, and classified as being to books, journal articles or Web contents. We were thus able to extract 4.0M citations to scholarly publications with known identifiers — including DOI, PMC, PMID, and ISBN — and further labeled an extra 261K citations with DOIs from Crossref.

As a result, we find that 6.7% of Wikipedia articles cite at least one journal article with an associated DOI. Scientific articles cited from Wikipedia correspond to 3.5% of all articles with a DOI currently indexed in the Web of Science. We release all our code to allow the community to extend upon our work and update the dataset in the future.

URL : https://arxiv.org/abs/2007.07022

Open Access of COVID-19-related publications in the first quarter of 2020: a preliminary study based in PubMed

Authors : Olatz Arrizabalaga, David Otaegui, Itziar Vergara, Julio Arrizabalaga, Eva Méndez

Background

The COVID-19 outbreak has made funders, researchers and publishers agree to have research publications, as well as other research outputs, such as data, become openly available.

In this extraordinary research context of the SARS CoV-2 pandemic, publishers are announcing that their coronavirus-related articles will be made immediately accessible in appropriate open repositories, like PubMed Central, agreeing upon funders’ and researchers’ instigation.

Methods

This work uses Unpaywall, OpenRefine and PubMed to analyse the level of openness of articles about COVID-19, published during the first quarter of 2020. It also analyses Open Access (OA) articles published about previous coronavirus (SARS CoV-1 and MERS CoV) as a means of comparison.

Results

A total of 5,611 COVID-19-related articles were analysed from PubMed. This is a much higher amount for a period of 4 months compared to those found for SARS CoV-1 and MERS during the first year of their first outbreaks (335 and 116 articles, respectively).

Regarding the levels of openness, 88.8% of the SARS CoV-2 papers are freely available; similar rates were found for the other coronaviruses. Deeper analysis showed that (i) 67.4% of articles belong to an undefined Bronze category; (ii) 76.4% of all OA papers don’t carry any license, followed by 10.4% which display restricted licensing. These patterns were found to be repeated in the three most frequent publishers: Elsevier, Springer and Wiley.

Conclusions

Our results suggest that, although scientific production is much higher than during previous epidemics and is open, there is a caveat to this opening, characterized by the absence of fundamental elements and values ​​on which Open Science is based, such as licensing.

URL : Open Access of COVID-19-related publications in the first quarter of 2020: a preliminary study based in PubMed

DOI : https://doi.org/10.12688/f1000research.24136.1

Striving for Modernity: Layout and Abstracts in the Biomedical Literature

Authors : Carlo Galli, Maria Teresa Colangelo, Stefano Guizzardi

Most academic journals have a fairly consistent look: they are structured similarly, their text is divided into similar sections; for example, they have an abstract at the beginning of the manuscript, and their text is usually organized in two columns.

There may be different reasons for this similarity, ranging from the need to contain publication costs by using less page space to conforming to an internationally well-accepted format that may be perceived as the hallmark of academic articles.

We surveyed 35 medical journals founded before 1960 and looked for their change in format over time and how this was experienced by and explained to readers.

We then discussed what recent research has shown about the effects of layout on reading, looking for further explanations as to why this format was so successful.

URL : Striving for Modernity: Layout and Abstracts in the Biomedical Literature

DOI : https://doi.org/10.3390/publications8030038

Open peer review: promoting transparency in open science

Authors : Dietmar Wolfram, Peiling Wang, Adam Hembree, Hyoungjoo Park

Open peer review (OPR), where review reports and reviewers’ identities are published alongside the articles, represents one of the last aspects of the open science movement to be widely embraced, although its adoption has been growing since the turn of the century.

This study provides the first comprehensive investigation of OPR adoption, its early adopters and the implementation approaches used. Current bibliographic databases do not systematically index OPR journals, nor do the OPR journals clearly state their policies on open identities and open reports.

Using various methods, we identified 617 OPR journals that published at least one article with open identities or open reports as of 2019 and analyzed their wide-ranging implementations to derive emerging OPR practices.

The findings suggest that: (1) there has been a steady growth in OPR adoption since 2001, when 38 journals initially adopted OPR, with more rapid growth since 2017; (2) OPR adoption is most prevalent in medical and scientific disciplines (79.9%); (3) five publishers are responsible for 81% of the identified OPR journals; (4) early adopter publishers have implemented OPR in different ways, resulting in different levels of transparency.

Across the variations in OPR implementations, two important factors define the degree of transparency: open identities and open reports. Open identities may include reviewer names and affiliation as well as credentials; open reports may include timestamped review histories consisting of referee reports and author rebuttals or a letter from the editor integrating reviewers’ comments.

When and where open reports can be accessed are also important factors indicating the OPR transparency level. Publishers of optional OPR journals should add metric data in their annual status reports.

URL : Open peer review: promoting transparency in open science

DOI : https://doi.org/10.1007/s11192-020-03488-4

Open Access Uptake in Germany 2010-18: Adoption in a diverse research landscape

Authors : Anne Hobert, Najko Jahn, Philipp Mayr, Birgit Schmidt, Niels Taubert

This study investigates the development of open access (OA) to journal articles from authors affiliated with German universities and non-university research institutions in the period 2010-2018.

Beyond determining the overall share of openly available articles, a systematic classification of distinct categories of OA publishing allows to identify different patterns of adoption to OA.

Taking into account the particularities of the German research landscape, variations in terms of productivity, OA uptake and approaches to OA are examined at the meso-level and possible explanations are discussed.

The development of the OA uptake is analysed for the different research sectors in Germany (universities, non-university research institutes of the Helmholtz Association, Fraunhofer Society, Max Planck Society, Leibniz Association, and government research agencies).

Combining several data sources (incl. Web of Science, Unpaywall, an authority file of standardised German affiliation information, the ISSN-Gold-OA 3.0 list, and OpenDOAR), the study confirms the growth of the OA share mirroring the international trend reported in related studies.

We found that 45% of all considered articles in the observed period were openly available at the time of analysis. Our findings show that subject-specific repositories are the most prevalent OA type. However, the percentages for publication in fully OA journals and OA via institutional repositories show similarly steep increases.

Enabling data-driven decision-making regarding OA implementation in Germany at the institutional level, the results of this study furthermore can serve as a baseline to assess the impact recent transformative agreements with major publishers will likely have on scholarly communication.

URL : Open Access Uptake in Germany 2010-18: Adoption in a diverse research landscape

DOI : https://doi.org/10.5281/zenodo.3892950

Wikimedia and universities: contributing to the global commons in the Age of Disinformation

Authors : Martin Poulter, Nick Sheppard

In its first 30 years the world wide web has revolutionized the information environment. However, its impact has been negative as well as positive, through corporate misuse of personal data and due to its potential for enabling the spread of disinformation.

As a large-scale collaborative platform funded through charitable donations, with a mission to provide universal free access to knowledge as a public good, Wikipedia is one of the most popular websites in the world.

This paper explores the role of Wikipedia in the information ecosystem where it occupies a unique role as a bridge between informal discussion and scholarly publication.

We explore how it relates to the broader Wikimedia ecosystem, through structured data on Wikidata for instance, and openly licensed media on Wikimedia Commons.

We consider the potential benefits for universities in the areas of information literacy and research impact, and investigate the extent to which universities in the UK and their libraries are engaging strategically with Wikimedia, if at all.

URL : Wikimedia and universities: contributing to the global commons in the Age of Disinformation

DOI : Wikimedia and universities: contributing to the global commons in the Age of Disinformation

The Sci-hub Effect: Sci-hub downloads lead to more article citations

Authors : Juan C. Correa, Henry Laverde-Rojas, Fernando Marmolejo-Ramos, Julian Tejada, Štepán Bahník

Citations are often used as a metric of the impact of scientific publications. Here, we examine how the number of downloads from Sci-hub as well as various characteristics of publications and their authors predicts future citations.

Using data from 12 leading journals in economics, consumer research, neuroscience, and multidisciplinary research, we found that articles downloaded from Sci-hub were cited 1.72 times more than papers not downloaded from Sci-hub and that the number of downloads from Sci-hub was a robust predictor of future citations.

Among other characteristics of publications, the number of figures in a manuscript consistently predicts its future citations. The results suggest that limited access to publications may limit some scientific research from achieving its full impact.

URL : https://arxiv.org/abs/2006.14979