From Code to Tenure: Valuing Research Software in Academia

Authors : Eric A. Jensen, Daniel S. Katz

Research software is a driving force in today’s academic ecosystem, with most researchers relying on it to do their work, and many writing some of their own code. Despite its importance, research software is typically not included in tenure, promotion, and recognition policies and processes.

In this article, we invite discussions on how to value research software, integrate it into academic evaluations, and ensure its sustainability. We build on discussions hosted by the US Research Software Sustainability Institute and by the international Research Software Engineering community to outline a set of possible activities aimed at elevating the role of research software in academic career paths, recognition, and beyond.

One is a study to investigate the role of software contributions in academic promotions. Another is to document and share successful academic recognition practices for research software. A third is to create guidance documents for faculty hiring and tenure evaluations. Each of these proposed activities is a building block of a larger effort to create a more equitable, transparent, and dynamic academic ecosystem.

We’ve assembled 44 such ideas as a starting point and posted them as issues in GitHub. Our aim is to encourage engagement with this effort. Readers are invited to do this by adding potential activities or commenting on existing ideas to improve them.

The issues page can also serve to inform the community of ongoing activities so that efforts aren’t duplicated. Similarly, if someone else has already made strides in a particular area, point out their work to build collective knowledge.

Finally, the issues page is also intended to allow anyone interested in collaborating on a specific activity to indicate their willingness to do so. This living list serves as a hub for collective action and thought, with the overall aim of recognizing the value of creating and contributing research software.

URL : From Code to Tenure: Valuing Research Software in Academia


The Rise of GitHub in Scholarly Publications

Authors : Emily Escamilla, Martin Klein, Talya Cooper, Vicky Rampin, Michele C. Weigle, Michael L. Nelson

The definition of scholarly content has expanded to include the data and source code that contribute to a publication. While major archiving efforts to preserve conventional scholarly content, typically in PDFs (e.g., LOCKSS, CLOCKSS, Portico), are underway, no analogous effort has yet emerged to preserve the data and code referenced in those PDFs, particularly the scholarly code hosted online on Git Hosting Platforms (GHPs).

Similarly, the Software Heritage Foundation is working to archive public source code, but there is value in archiving the issue threads, pull requests, and wikis that provide important context to the code while maintaining their original URLs. In current implementations, source code and its ephemera are not preserved, which presents a problem for scholarly projects where reproducibility matters.

To understand and quantify the scope of this issue, we analyzed the use of GHP URIs in the arXiv and PMC corpora from January 2007 to December 2021. In total, there were 253,590 URIs to GitHub, SourceForge, Bitbucket, and GitLab repositories across the 2.66 million publications in the corpora.

We found that GitHub, GitLab, SourceForge, and Bitbucket were collectively linked to 160 times in 2007 and 76,746 times in 2021. In 2021, one out of five publications in the arXiv corpus included a URI to GitHub.

The complexity of GHPs like GitHub is not amenable to conventional Web archiving techniques. Therefore, the growing use of GHPs in scholarly publications points to an urgent and growing need for dedicated efforts to archive their holdings in order to preserve research code and its scholarly ephemera.


Who Writes Scholarly Code?

Authors : Sarah Nguyễn, Vicky Rampin

This paper presents original research about the behaviours, histories, demographics, and motivations of scholars who code, specifically how they interact with version control systems locally and on the Web.

By understanding patrons through multiple lenses – daily productivity habits, motivations, and scholarly needs – librarians and archivists can tailor services for software management, curation, and long-term reuse, raising the possibility for long-term reproducibility of a multitude of scholarship.

URL : Who Writes Scholarly Code?


How Long Can We Build It? Ensuring Usability of a Scientific Code Base

Authors : Klaus Rechert, Jurek Oberhauser, Rafael Gieschke

Software and in particular source code became an important component of scientific publications and henceforth is now subject of research data management. Maintaining source code such that it remains a usable and a valuable scientific contribution is and remains a huge task.

Not all code contributions can be actively maintained forever. Eventually, there will be a significant backlog of legacy source-code. In this article we analyse the requirements for applying the concept of long-term reusability to source code.

We use simple case study to identify gaps and provide a technical infrastructure based on emulator to support automated builds of historic software in form of source code.

URL : How Long Can We Build It? Ensuring Usability of a Scientific Code Base


Science-Software Linkage: The Challenges of Traceability between Scientific Knowledge and Software Artifacts

Authors : Hideaki Hata, Jin L.C. Guo, Raula Gaikovina Kula, Christoph Treude

Although computer science papers are often accompanied by software artifacts, connecting research papers to their software artifacts and vice versa is not always trivial. First of all, there is a lack of well-accepted standards for how such links should be provided.

Furthermore, the provided links, if any, often become outdated: they are affected by link rot when pre-prints are removed, when repositories are migrated, or when papers and repositories evolve independently.

In this paper, we summarize the state of the practice of linking research papers and associated source code, highlighting the recent efforts towards creating and maintaining such links.

We also report on the results of several empirical studies focusing on the relationship between scientific papers and associated software artifacts, and we outline challenges related to traceability and opportunities for overcoming these challenges.


Repository Approaches to Improving the Quality of Shared Data and Code

Authors : Ana Trisovic, Katherine Mika, Ceilyn Boyd, Sebastian Feger, Mercè Crosas

Sharing data and code for reuse has become increasingly important in scientific work over the past decade. However, in practice, shared data and code may be unusable, or published results obtained from them may be irreproducible.

Data repository features and services contribute significantly to the quality, longevity, and reusability of datasets.

This paper presents a combination of original and secondary data analysis studies focusing on computational reproducibility, data curation, and gamified design elements that can be employed to indicate and improve the quality of shared data and code.

The findings of these studies are sorted into three approaches that can be valuable to data repositories, archives, and other research dissemination platforms.

URL : Repository Approaches to Improving the Quality of Shared Data and Code


A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility

Authors : Nicholas J Tierney, Karthik Ram

Data makes science possible. Sharing data improves visibility, and makes the research process transparent. This increases trust in the work, and allows for independent reproduction of results.

However, a large proportion of data from published research is often only available to the original authors. Despite the obvious benefits of sharing data, and scientists’ advocating for the importance of sharing data, most advice on sharing data discusses its broader benefits, rather than the practical considerations of sharing.

This paper provides practical, actionable advice on how to actually share data alongside research. The key message is sharing data falls on a continuum, and entering it should come with minimal barriers.