Authors : Jorge Chamorro-Padial, Francisco-Javier Rodrigo-Ginés, Rosa Rodríguez-Sánchez, R.M. Gil, Roberto García
Scientific progress depends on the accessibility and reproducibility of research outputs. Unfortunately, datasets and other referenced resources in academic publications frequently become unavailable over time, limiting reproducibility and reuse.
In this work, we quantitatively analyze the potential impact of research data unavailability by applying economic, probabilistic, and network based models to scientific citation networks. Rather than measuring knowledge directly, we use citation based network metrics as proxies for the dissemination and potential reuse of scientific results, and study how the absence of data-linked resources affects impact propagation and productivity-related indicators.
We further examine the resilience of citation networks under different modeling assumptions and analyze the role of highly influential nodes, or superpropagators, in amplifying the effects of dataset loss.
Our results reveal structural dependencies on vulnerable data sources and show that the magnitude of the impact depends strongly on network position and model assumptions.
These findings provide quantitative evidence of the systemic consequences of data unavailability and underline the importance of long-term data preservation and accessibility policies in scientific research.
URL : Modeling the impact of research data unavailability on science