A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility

Authors : Nicholas J Tierney, Karthik Ram

Data makes science possible. Sharing data improves visibility, and makes the research process transparent. This increases trust in the work, and allows for independent reproduction of results.

However, a large proportion of data from published research is often only available to the original authors. Despite the obvious benefits of sharing data, and scientists’ advocating for the importance of sharing data, most advice on sharing data discusses its broader benefits, rather than the practical considerations of sharing.

This paper provides practical, actionable advice on how to actually share data alongside research. The key message is sharing data falls on a continuum, and entering it should come with minimal barriers.

URL : https://arxiv.org/abs/2002.11626

Semantic publishing, la sémantique dans la sémiotique des codes sources d’écrits d’écran scientifiques

Auteur/Author : Gérald Kembellec

Cet article analyse les enjeux du semantic publishing en contexte scientifique et examine sous un axe sémiotique les codes sources qui en sont le vecteur de propagation.

Sont présentés et discutés les différents signes passeurs qui rendent possible le maillage de l’écriture fragmentaire en réseau : le RDFa, les microdonnées et le JSON-LD par exemple. Leurs usages sont ici analysés et mis en relation avec les besoins et objectifs des chercheurs, qu’ils soient auteurs ou lecteurs.

Enfin, le futur du semantic publishing scientifique est anticipé de manière critique et des points de vigilance sont évoqués tant sur la gouvernance des autorités et des schémas qui étayent le linked data que sur les tentations d’user et d’abuser des bénéfices communicationnels annexes entre médiation et médiatisation.

URL : https://lesenjeux.univ-grenoble-alpes.fr/2019/dossier/04-semantic-publishing-la-semantique-dans-la-semiotique-des-codes-sources-decrits-decran-scientifiques

From Academia to Software Development: Publication Citations in Source Code Comments

Authors : Akira Inokuchi, Yusuf Sulistyo Nugroho, Fumiaki Konishi, Hideaki Hata, Akito Monden, Kenichi Matsumoto

Academic publications have been evaluated with the impact on research communities based on the number of citations. On the other hand, the impact of academic publications on industry has been rarely studied.

This paper investigates how academic publications contribute to software development by analyzing publication citations in source code comments in open source software repositories.

We propose an automated approach of detecting academic publications based on Named Entity Recognition, and achieve 0.90 in F1 as detection accuracy. We conduct a large-scale study of publication citations with 319,438,977 comments collected from active 25,925 repositories written in seven programming languages.

Our findings indicate that academic publications can be knowledge sources of software development, and there can be potential issues of obsoleting knowledge.

URL : https://arxiv.org/abs/1910.06932

Identifiers for Digital Objects: the Case of Software Source Code Preservation

Authors : Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli

In the very broad scope addressed by digital preservation initiatives, a special place belongs to the scientific and technical artifacts that we need to properly archive to enable scientific reproducibility.

For these artifacts we need identifiers that are not only unique and persistent, but also support integrity in an intrinsic way. They must provide strong guarantees that the object denoted by a given identifier will always be the same, without relying on third parties and external administrative processes.

In this article, we report on our quest for this identifiers for digital objects (IDOs), whose properties are different from, and complementary to, those of the various digital identifiers of objects (DIOs) that are in widespread use today.

We argue that both kinds of identifiers are needed and present the framework for intrinsic persistent identifiers that we have adopted in Software Heritage for preserving billions of software artifacts.

URL : https://hal.archives-ouvertes.fr/hal-01865790

Software Heritage: Why and How to Preserve Software Source Code

Authors : Roberto Di Cosmo, Stefano Zacchiroli

Software is now a key component present in all aspects of our society. Its preservation has attracted growing attention over the past years within the digital preservation community.

We claim that source code—the only representation of software that contains human readable knowledge—is a precious digital object that needs special handling: it must be a first class citizen in the preservation landscape and we need to take action immediately, given the increasingly more frequent incidents that result in permanent losses of source code collections. In this paper we present Software Heritage, an ambitious initiative to collect, preserve, and share the entire corpus of publicly accessible software source code.

We discuss the archival goals of the project, its use cases and role as a participant in the broader digital preservation ecosystem, and detail its key design decisions. We also report on the project road map and the current status of the Software Heritage archive that, as of early 2017, has collected more than 3 billion unique source code files and 700 million commits coming from more than 50 million software development projects.

URL : https://hal.archives-ouvertes.fr/hal-01590958