Science-Software Linkage: The Challenges of Traceability between Scientific Knowledge and Software Artifacts

Authors : Hideaki Hata, Jin L.C. Guo, Raula Gaikovina Kula, Christoph Treude

Although computer science papers are often accompanied by software artifacts, connecting research papers to their software artifacts and vice versa is not always trivial. First of all, there is a lack of well-accepted standards for how such links should be provided.

Furthermore, the provided links, if any, often become outdated: they are affected by link rot when pre-prints are removed, when repositories are migrated, or when papers and repositories evolve independently.

In this paper, we summarize the state of the practice of linking research papers and associated source code, highlighting the recent efforts towards creating and maintaining such links.

We also report on the results of several empirical studies focusing on the relationship between scientific papers and associated software artifacts, and we outline challenges related to traceability and opportunities for overcoming these challenges.

URL : https://arxiv.org/abs/2104.05891

GitHub Repositories with Links to Academic Papers: Open Access, Traceability, and Evolution

Authors : Supatsara Wattanakriengkrai, Bodin Chinthanet, Hideaki Hata, Raula Gaikovina Kula, Christoph Treude, Jin Guo, Kenichi Matsumoto

Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of Open Source Software implements bleeding edge science into its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the link impact remains unknown.

This paper investigates the role of academic paper references contained in these repositories. We conducted a large-scale study of 20 thousand GitHub repositories to establish prevalence of references to academic papers. We use a mixed-methods approach to identify Open Access (OA), traceability and evolutionary aspects of the links.

Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are OA. In terms of traceability, our analysis revealed that machine learning is the most prevalent topic of repositories. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository.

A case study of referenced arXiv paper shows that most of these papers are high-impact and influential and do align with academia, referenced by repositories written in different programming languages. From the evolutionary aspect, we find very few changes of papers being referenced and links to them.

URL : https://arxiv.org/abs/2004.00199