Open access improves the dissemination of science: insights from Wikipedia

Authors : Puyu Yang, Ahad Shoaib, Robert West, Giovanni Colavizza

Wikipedia is a well-known platform for disseminating knowledge, and scientific sources, such as journal articles, play a critical role in supporting its mission. The open access movement aims to make scientific knowledge openly available, and we might intuitively expect open access to help further Wikipedia’s mission. However, the extent of this relationship remains largely unknown.

To fill this gap, we analyse a large dataset of citations from the English Wikipedia and model the role of open access in Wikipedia’s citation patterns. Our findings reveal that Wikipedia relies on open access articles at a higher overall rate (44.1%) compared to their availability in the Web of Science (23.6%) and OpenAlex (22.6%). Furthermore, both the accessibility (open access status) and academic impact (citation count) significantly increase the probability of an article being cited on Wikipedia.

Specifically, open access articles are extensively and increasingly more cited in Wikipedia, as they show an approximately 64.7% higher likelihood of being cited in Wikipedia when compared to paywalled articles, after controlling for confounding factors. This open access citation effect is particularly strong for articles with high citation counts or published in recent years.

Our findings highlight the pivotal role of open access in facilitating the dissemination of scientific knowledge, thereby increasing the likelihood of open access articles reaching a more diverse audience through platforms such as Wikipedia. Simultaneously, open access articles contribute to the reliability of Wikipedia as a source by affording editors timely access to novel results.

URL : Open access improves the dissemination of science: insights from Wikipedia

DOI : https://doi.org/10.1007/s11192-024-05163-4

Wikipedia Citations: A comprehensive dataset of citations with identifiers extracted from English Wikipedia

Authors : Harshdeep Singh, Robert West, Giovanni Colavizza

Wikipedia’s contents are based on reliable and published sources. To this date, little is known about what sources Wikipedia relies on, in part because extracting citations and identifying cited sources is challenging. To close this gap, we release Wikipedia Citations, a comprehensive dataset of citations extracted from Wikipedia.

A total of 29.3M citations were extracted from 6.1M English Wikipedia articles as of May 2020, and classified as being to books, journal articles or Web contents. We were thus able to extract 4.0M citations to scholarly publications with known identifiers — including DOI, PMC, PMID, and ISBN — and further labeled an extra 261K citations with DOIs from Crossref.

As a result, we find that 6.7% of Wikipedia articles cite at least one journal article with an associated DOI. Scientific articles cited from Wikipedia correspond to 3.5% of all articles with a DOI currently indexed in the Web of Science. We release all our code to allow the community to extend upon our work and update the dataset in the future.

URL : https://arxiv.org/abs/2007.07022

Why We Read Wikipedia

Authors : Philipp Singer, Florian Lemmerich, Robert West, Leila Zia, Ellery Wulczyn, Markus Strohmaier, Jure Leskovec

Wikipedia is one of the most popular sites on the Web, with millions of users relying on it to satisfy a broad range of information needs every day. Although it is crucial to understand what exactly these needs are in order to be able to meet them, little is currently known about why users visit Wikipedia.

The goal of this paper is to fill this gap by combining a survey of Wikipedia readers with a log-based analysis of user activity.

Based on an initial series of user surveys, we build a taxonomy of Wikipedia use cases along several dimensions, capturing users’ motivations to visit Wikipedia, the depth of knowledge they are seeking, and their knowledge of the topic of interest prior to visiting Wikipedia.

Then, we quantify the prevalence of these use cases via a large-scale user survey conducted on live Wikipedia with almost 30,000 responses.

Our analyses highlight the variety of factors driving users to Wikipedia, such as current events, media coverage of a topic, personal curiosity, work or school assignments, or boredom.

Finally, we match survey responses to the respondents’ digital traces in Wikipedia’s server logs, enabling the discovery of behavioral patterns associated with specific use cases.

For instance, we observe long and fast-paced page sequences across topics for users who are bored or exploring randomly, whereas those using Wikipedia for work or school spend more time on individual articles focused on topics such as science.

Our findings advance our understanding of reader motivations and behavior on Wikipedia and can have implications for developers aiming to improve Wikipedia’s user experience, editors striving to cater to their readers’ needs, third-party services (such as search engines) providing access to Wikipedia content, and researchers aiming to build tools such as recommendation engines.

URL : Why We Read Wikipedia

Alternative location : https://arxiv.org/abs/1702.05379