Science through Wikipedia: A novel representation of open knowledge through co-citation networks

Authors : Wenceslao Arroyo-Machado, Daniel Torres-Salinas, Enrique Herrera-Viedma, Esteban Romero-Frías

This study provides an overview of science from the Wikipedia perspective. A methodology has been established for the analysis of how Wikipedia editors regard science through their references to scientific papers.

The method of co-citation has been adapted to this context in order to generate Pathfinder networks (PFNET) that highlight the most relevant scientific journals and categories, and their interactions in order to find out how scientific literature is consumed through this open encyclopaedia.

In addition to this, their obsolescence has been studied through Price index. A total of 1 433 457 references available at this http URL have been initially taken into account. After pre-processing and linking them to the data from Elsevier’s CiteScore Metrics the sample was reduced to 847 512 references made by 193 802 Wikipedia articles to 598 746 scientific articles belonging to 14 149 journals indexed in Scopus.

As highlighted results we found a significative presence of “Medicine” and “Biochemistry, Genetics and Molecular Biology” papers and that the most important journals are multidisciplinary in nature, suggesting also that high-impact factor journals were more likely to be cited. Furthermore, only 13.44% of Wikipedia citations are to Open Access journals.

URL : https://arxiv.org/abs/2002.04347

Les désaccords éditoriaux dans Wikipédia comme tensions entre régimes épistémiques

Auteurs/Authors : Guillaume Carbou, Gilles Sahut

Malgré son architecture normative élaborée, Wikipédia est le lieu de désaccords récurrents entre contributeurs.

Les auteurs montrent, à partir de l’analyse argumentative d’un corpus des pages de discussion d’articles suscitant de forts débats (OGM, 11 septembre, etc.), que ces désaccords sont en partie sous-tendus par l’existence de « régimes épistémiques » concurrents sur Wikipédia.

Ces régimes épistémiques (encyclopédiste, scientifique, scientiste, wiki, critique et doxique) correspondent à autant de conceptions divergentes du « valide » et des modalités pour y aboutir.

URL : https://journals.openedition.org/communication/10788

Wikipedia Text Reuse: Within and Without

Authors : Milad Alshomary, Michael Völske, Tristan Licht, Henning Wachsmuth, Benno Stein, Matthias Hagen, Martin Potthast

We study text reuse related to Wikipedia at scale by compiling the first corpus of text reuse cases within Wikipedia as well as without (i.e., reuse of Wikipedia text in a sample of the Common Crawl).

To discover reuse beyond verbatim copy and paste, we employ state-of-the-art text reuse detection technology, scaling it for the first time to process the entire Wikipedia as part of a distributed retrieval pipeline.

We further report on a pilot analysis of the 100 million reuse cases inside, and the 1.6 million reuse cases outside Wikipedia that we discovered. Text reuse inside Wikipedia gives rise to new tasks such as article template induction, fixing quality flaws due to inconsistencies arising from asynchronous editing of reused passages, or complementing Wikipedia’s ontology.

Text reuse outside Wikipedia yields a tangible metric for the emerging field of quantifying Wikipedia’s influence on the web. To foster future research into these tasks, and for reproducibility’s sake, the Wikipedia text reuse corpus and the retrieval pipeline are made freely available.

URL : https://arxiv.org/abs/1812.09221

Wikipedia: an opportunity to rethink the links between sources’ credibility, trust and authority

Authors : Gilles Sahut, André Tricot

The Web and its main tools (Google, Wikipedia, Facebook, Twitter) deeply raise and renew fundamental questions, that everyone asks almost every day: Is this information or content true? Can I trust this author or source?

These questions are not new, they have been the same with books, newspapers, broadcasting and television, and, more fundamentally, in every human interpersonal communication.

This paper is focused on two scientific problems on this issue. The first one is theoretical: to address this issue, many concepts have been used in library and information sciences, communication and psychology.

The links between these concepts are not clear: sometimes two concepts are considered as synonymous, sometimes as very different. The second one is historical: sources like Wikipedia deeply challenge the epistemic evaluation of information sources, compared to previous modes of information production.

This paper proposes an integrated and simple model considering the relation between a user, a document and an author as human communication. It reduces the problem to three concepts: credibility as a characteristic granted to information depending on its truth-value; trust as the ability to produce credible information; authority when the power to influence of an author is accepted, i.e., when readers accept that the source can modify their opinion, knowledge and decisions.

The model describes also two kinds of relationships between the three concepts: an upward link and a downward link. The model is confronted with findings of empirical research on Wikipedia in particular.

URL : https://firstmonday.org/ojs/index.php/fm/article/view/7108/6555

The Evolution of the Concept of Semantic Web in the Context of Wikipedia: An Exploratory Approach to Study the Collective Conceptualization in a Digital Collaborative Environment

Authors : Luís Miguel Machado, Maria Manuel Borges, Renato Rocha Souza

Wikipedia, as a “social machine”, is a privileged place to observe the collective construction of concepts without central control. Based on Dahlberg’s theory of concept, and anchored in the pragmatism of Hjørland—in which the concepts are socially negotiated meanings—the evolution of the concept of semantic web (SW) was analyzed in the English version of Wikipedia.

An exploratory, descriptive, and qualitative study was designed and we identified 26 different definitions (between 12 July 2001 and 31 December 2017), of which eight are of particular relevance for their duration, with the latter being the two recorded at the end of the analyzed period.

According to them, SW: “is an extension of the web” and “is a Web of Data”; the latter, used as a complementary definition, links to Berners-Lee’s publications. In Wikipedia, the evolution of the SW concept appears to be based on the search for the use of non-technical vocabulary and the control of authority carried out by the debate.

As a space for collective bargaining of meanings, the Wikipedia study may bring relevant contributions to a community’s understanding of a particular concept and how it evolves over time.

URL : The Evolution of the Concept of Semantic Web in the Context of Wikipedia: An Exploratory Approach to Study the Collective Conceptualization in a Digital Collaborative Environment

DOI : https://doi.org/10.3390/publications6040044

La gouvernance de Wikipédia : élaboration de règles et théorie d’Ostrom

Auteur/Author : Gilles Sahut

La réussite de Wikipédia est fréquemment attribuée à la pertinence de sa gouvernance. Toutefois, il n’existe pas de consensus scientifique pour la caractériser.

Dans cette étude empirique, nous nous penchons sur une facette de cette gouvernance au sein de la Wikipédia francophone : les modalités de construction de deux règles liées à la citation des sources.

Elles sont étudiées au travers de la théorie d’Ostrom sur les communs. Nous montrons que ces règles sont discutées et écrites par une minorité de contributeurs particulièrement impliqués. Ainsi, il n’y a pas, dans Wikipédia, de « classe politique » coupée du terrain.

Nous soulignons également l’influence du dispositif communicationnel interne sur ce processus ainsi que celle de la Wikipédia anglophone.

URL : La gouvernance de Wikipédia : élaboration de règles et théorie d’Ostrom

Alternative location : http://journals.openedition.org/ticetsociete/2426

Science Is Shaped by Wikipedia: Evidence From a Randomized Control Trial

Authors : Neil Thompson, Douglas Hanley

“I sometimes think that general and popular treatises are almost as important for the progress of science as original work.” – Charles Darwin, 1865.

As the largest encyclopedia in the world, it is not surprising that Wikipedia reflects the state of scientific knowledge. However, Wikipedia is also one of the most accessed websites in the world, including by scientists, which suggests that it also has the potential to shape science. This paper shows that it does.

Incorporating ideas into Wikipedia leads to those ideas being used more in the scientific literature. We provide correlational evidence of this across thousands of Wikipedia articles and causal evidence of it through a randomized control trial where we add new scientific content to Wikipedia.

We find that the causal impact is strong, with Wikipedia influencing roughly one in every ∼830 words in related scientific journal articles. We also find causal evidence that the scientific articles referenced in Wikipedia receive more citations, suggesting that Wikipedia complements the traditional journal system by pointing researchers to key underlying scientific articles.

Our findings speak not only to the influence of Wikipedia, but more broadly to the influence of repositories of scientific knowledge and the role that they play in the creation of scientific knowledge.

DOI : https://dx.doi.org/10.2139/ssrn.3039505