Authors : Supatsara Wattanakriengkrai, Bodin Chinthanet, Hideaki Hata, Raula Gaikovina Kula, Christoph Treude, Jin Guo, Kenichi Matsumoto
Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of Open Source Software implements bleeding edge science into its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the link impact remains unknown.
This paper investigates the role of academic paper references contained in these repositories. We conducted a large-scale study of 20 thousand GitHub repositories to establish prevalence of references to academic papers. We use a mixed-methods approach to identify Open Access (OA), traceability and evolutionary aspects of the links.
Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are OA. In terms of traceability, our analysis revealed that machine learning is the most prevalent topic of repositories. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository.
A case study of referenced arXiv paper shows that most of these papers are high-impact and influential and do align with academia, referenced by repositories written in different programming languages. From the evolutionary aspect, we find very few changes of papers being referenced and links to them.