Research Data in Scientific Publications: A Cross-Field Analysis

Authors : Puyu Yang, Giovanni Colavizza

Data sharing is fundamental to scientific progress, enhancing transparency, reproducibility, and innovation across disciplines. Despite its growing significance, the variability of data-sharing practices across research fields remains insufficiently understood, limiting the development of effective policies and infrastructure.

This study investigates the evolving landscape of data-sharing practices, specifically focusing on the intentions behind data release, reuse, and referencing. Leveraging the PubMed open dataset, we developed a model to identify mentions of datasets in the full-text of publications. Our analysis reveals that data release is the most prevalent sharing mode, particularly in fields such as Commerce, Management, and the Creative Arts.

In contrast, STEM fields, especially the Biological and Agricultural Sciences, show significantly higher rates of data reuse. However, the humanities and social sciences are slower to adopt these practices. Notably, dataset referencing remains low across most disciplines, suggesting that datasets are not yet fully recognized as research outputs.

A temporal analysis highlights an acceleration in data releases after 2012, yet obstacles such as data discoverability and compatibility for reuse persist. Our findings can inform institutional and policy-level efforts to improve data-sharing practices, enhance dataset accessibility, and promote broader adoption of open science principles across research domains.

Arxiv : https://arxiv.org/abs/2502.01407

Open access improves the dissemination of science: insights from Wikipedia

Authors : Puyu Yang, Ahad Shoaib, Robert West, Giovanni Colavizza

Wikipedia is a well-known platform for disseminating knowledge, and scientific sources, such as journal articles, play a critical role in supporting its mission. The open access movement aims to make scientific knowledge openly available, and we might intuitively expect open access to help further Wikipedia’s mission. However, the extent of this relationship remains largely unknown.

To fill this gap, we analyse a large dataset of citations from the English Wikipedia and model the role of open access in Wikipedia’s citation patterns. Our findings reveal that Wikipedia relies on open access articles at a higher overall rate (44.1%) compared to their availability in the Web of Science (23.6%) and OpenAlex (22.6%). Furthermore, both the accessibility (open access status) and academic impact (citation count) significantly increase the probability of an article being cited on Wikipedia.

Specifically, open access articles are extensively and increasingly more cited in Wikipedia, as they show an approximately 64.7% higher likelihood of being cited in Wikipedia when compared to paywalled articles, after controlling for confounding factors. This open access citation effect is particularly strong for articles with high citation counts or published in recent years.

Our findings highlight the pivotal role of open access in facilitating the dissemination of scientific knowledge, thereby increasing the likelihood of open access articles reaching a more diverse audience through platforms such as Wikipedia. Simultaneously, open access articles contribute to the reliability of Wikipedia as a source by affording editors timely access to novel results.

URL : Open access improves the dissemination of science: insights from Wikipedia

DOI : https://doi.org/10.1007/s11192-024-05163-4