Theory and Practice of Data Citation

Author : Gianmaria Silvello

Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming « data-intensive », where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets.

Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has.

The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation.

The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.


Software Heritage: Why and How to Preserve Software Source Code

Authors : Roberto Di Cosmo, Stefano Zacchiroli

Software is now a key component present in all aspects of our society. Its preservation has attracted growing attention over the past years within the digital preservation community.

We claim that source code—the only representation of software that contains human readable knowledge—is a precious digital object that needs special handling: it must be a first class citizen in the preservation landscape and we need to take action immediately, given the increasingly more frequent incidents that result in permanent losses of source code collections. In this paper we present Software Heritage, an ambitious initiative to collect, preserve, and share the entire corpus of publicly accessible software source code.

We discuss the archival goals of the project, its use cases and role as a participant in the broader digital preservation ecosystem, and detail its key design decisions. We also report on the project road map and the current status of the Software Heritage archive that, as of early 2017, has collected more than 3 billion unique source code files and 700 million commits coming from more than 50 million software development projects.


Fast and Furious (at Publishers): The Motivations behind Crowdsourced Research Sharing

Authors : Carolyn Caffrey Gardner, Gabriel J. Gardner

Crowdsourced research sharing takes place across social media platforms including Twitter hashtags such as #icanhazpdf, Reddit Scholar, and Facebook.

This study surveys users of these peer-to-peer exchanges on demographic information, frequency of use, and their motivations in both providing and obtaining scholarly information on these platforms. Respondents also provided their perspectives on the database terms of service and/or copyright violations in these exchanges.

Findings indicate that the motivations of this community are utilitarian or ideological in nature, similar to other peer-to-peer file sharing online. Implications for library services including instruction, outreach, and interlibrary loan are discussed.

Measuring Cost per Use of Library-Funded Open Access Article Processing Charges: Examination and Implications of One Method

Authors : Crystal Hampson, Elizabeth Stregger


Libraries frequently support their open access (OA) fund using money from their collections budget. Interest in assessment of OA funds is arising. Cost per use is a common method to assess library collections expenditures.

OA article processing charges (APCs) are a one-time cost for global, perpetual use. Article level metrics provide data on global, cumulative article level usage. This article examines a method and discusses the limitations and implications of using article level metrics to calculate cost per use for OA APCs.


Using different APC models from two publishers, PLOS and BioMed Central, this article presents a cost per use formula for each model.


The formula for each model is demonstrated with available data. The examples suggest a very low cost per use for OA APCs after only three years.


Several limitations exist to obtaining article level data currently, including the nature of open access and accessibility of the data. OA articles’ usage levels are high and include use from altruistic access. Cost per use comparison with traditional publishing models is possible; however, comparison between different OA expenditures with very low costs per use may not be helpful.


Article level metrics can provide a means to measure cost per use of OA APCs. Libraries need increased access to article level usage data. They will also need to develop new benchmarks and expectations to evaluate APC payments, given higher usage levels for OA articles and considering altruistic access.

Open notebook science as an emerging epistemic culture within the Open Science movement

Authors : Anne Clinio, Sarita Albagli

The paper addresses the concepts and practices of “open notebook science” (Bradley, 2006) as an innovation within the contemporary Open Science movement. Our research points out that open notebook science is not an incremental improvement, but it is a new “literary technology” (Shapin, Shaffer, 1985) and main element of a complex open collaboration ecosystem that fosters a new epistemic culture (Knorr-Cetina, 1999).

This innovation aimed to move from a “science based on trust” to a science based on transparency and data provenance – a shift that recognizes the ability of scientists in performing experiments, but mostly, values their capacity of documenting properly what they say they have done. The theoretical framework was built with the notion of epistemic culture (Knorr-Cetina, 1999) and the “three technologies” perspective used by Shapin and Shaffer (1985) to describe the construction by natural philosophers of “matter of fact” as “variety of knowledge” so powerful that became synonymous of science itself.

Empirically, we entered the “open lab” through a netnography that led us to understand that the epistemic culture being engendered by its practitioners is based on a “matter of proof”.


Who needs access to research? Exploring the societal impact of open access

Author : ElHassan ElSabry

Studies about open access (OA) have predominantly focused it impact on communication within the scholarly community. For example, many studies have been published on what is called the “Open Access Citation Advantage (OACA)”.

On the other hand, implications of OA in non-academic contexts (e.g. medical practice, policymaking, patient advocacy and citizen science) have been the subject of and the basis for a lot of the advocacy work and many funding agencies’ OA policies, but not so much the subject of original research studies.

To date, this study is the first attempt to collect and synthesize the available evidence on the societal impact of open access. It further builds on this evidence base by introducing a typology of the various science-society interfaces where demand for access to research potentially exists.

The proposed scheme is anticipated to provide guidance for future research on the issue of OA’s societal impact. The paper concludes with a discussion of the implications of non-academic usage of research on the open access debate, especially on the question of who should bear the cost of scholarly publishing.


A genealogy of open access: negotiations between openness and access to research

Author : Samuel A. Moore

Open access (OA) is a contested term with a complicated history and a variety of understandings. This rich history is routinely ignored by institutional, funder and governmental policies that instead enclose the concept and promote narrow approaches to OA.

This article presents a genealogy of the term open access, focusing on the separate histories that emphasise openness and reusability on the one hand, as borrowed from the open-source software and free culture movements, and accessibility on the other hand, as represented by proponents of institutional and subject repositories.

This genealogy is further complicated by the publishing cultures that have evolved within individual communities of practice: publishing means different things to different communities and individual approaches to OA are representative of this fact.

From analysing its historical underpinnings and subsequent development, I argue that OA is best conceived as a boundary object, a term coined by Star and Griesemer (1989) to describe concepts with a shared, flexible definition between communities of practice but a more community-specific definition within them.

Boundary objects permit working relationships between communities while allowing local use and development of the concept. This means that OA is less suitable as a policy object, because boundary objects lose their use-value when ‘enclosed’ at a general level, but should instead be treated as a community-led, grassroots endeavour.