Versioning Data Is About More than Revisions: A Conceptual Framework and Proposed Principles

Authors : Jens Klump, Lesley Wyborn, Mingfang Wu, Julia Martin, Robert R. Downs, Ari Asmi

A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (e.g., as part of a time series), etc.

In addition, datasets might be bundled into collections, distributed in different encodings or mirrored onto different platforms. All these differences between versions of datasets need to be understood by researchers who want to cite the exact version of the dataset that was used to underpin their research.

Failing to do so reduces the reproducibility of research results. Ambiguous identification of datasets also impacts researchers and data centres who are unable to gain recognition and credit for their contributions to the collection, creation, curation and publication of individual datasets.

Although the means to identify datasets using persistent identifiers have been in place for more than a decade, systematic data versioning practices are currently not available. In this work, we analysed 39 use cases and current practices of data versioning across 33 organisations.

We noticed that the term ‘version’ was used in a very general sense, extending beyond the more common understanding of ‘version’ to refer primarily to revisions and replacements. Using concepts developed in software versioning and the Functional Requirements for Bibliographic Records (FRBR) as a conceptual framework, we developed six foundational principles for versioning of datasets: Revision, Release, Granularity, Manifestation, Provenance and Citation.

These six principles provide a high-level framework for guiding the consistent practice of data versioning and can also serve as guidance for data centres or data providers when setting up their own data revision and version protocols and procedures.

URL : Versioning Data Is About More than Revisions: A Conceptual Framework and Proposed Principles

DOI : http://doi.org/10.5334/dsj-2021-012

Inferring the causal effect of journals on citations

Author : Vincent A Traag

Articles in high-impact journals are, on average, more frequently cited. But are they cited more often because those articles are somehow more “citable”? Or are they cited more often simply because they are published in a high-impact journal? Although some evidence suggests the latter, the causal relationship is not clear.

We here compare citations of preprints to citations of the published version to uncover the causal mechanism. We build on an earlier model of citation dynamics to infer the causal effect of journals on citations. We find that high-impact journals select articles that tend to attract more citations.

At the same time, we find that high-impact journals augment the citation rate of published articles. Our results yield a deeper understanding of the role of journals in the research system.

The use of journal metrics in research evaluation has been increasingly criticized in recent years and article-level citations are sometimes suggested as an alternative. Our results show that removing impact factors from evaluation does not negate the influence of journals. This insight has important implications for changing practices of research evaluation.

DOI : https://doi.org/10.1162/qss_a_00128

Open access analytics with open access repository data: A Multi-level perspective

Author : Ibraheem Mohammed Sultan Al Sadi

Within nearly two decades after the open access movement emerged, its community has drawn attention to understanding its development, coverage, obstacles and motivations. To do so, they depend on data-centric analytics of open access publishing activities, using Web information space as their data sources for these analytical activities.

Open access repositories are one such data source that nurtures open access publishing activities and are a valuable source for analytics. Therefore, the open access community utilises open access repository infrastructure to develop and operate analytics, harnessing the widely adopted Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) interoperability layer to develop value-added services with an analytics agenda.

However,this layer presents its limitations and challenges regarding the support of analytical value-added services. To address these practices, this research has taken the step to consolidate these practices into the ‘open access analytics’ notion of drawing attention to its significance and bridge it with data analytics literature.

As part of this, an explanatory case study demonstrate show the OAI-PMH service provider approach supports open access analytics and also presents its limitations using Registry of Open Access Repositories (ROAR) analytics as a case study.

The case study reflects the limitation of open access registries to enable a single point of discovery due to the quality of their records and complexity of open access repositories taxonomy, the complexity of operationalising the unit of analysis in particular analytics due to the limitations in the OAI-PMH metadata schemes, the complex and resource-intensive harvesting process due to the large volume of data and the low quality of OAI-PMH standards adoptions and the issue of service provider suitability due to a single point of failure.

Also, this doctoral thesis proposes the use of Open Access Analytics using Open Access Repository Data with a Social Machine (OAA-OARD-SM) as a conceptual frame work to deliver open access analytics by using the open access repository infrastructure in acollaborative manner with social machines.

Furthermore, it takes advantage of the web observatory infrastructure as a form of web-based mediated technology to coordinate the open access analytics process. The conceptual framework re-frames the open access analytics process into four layers: the open access repository layer, the open access registry layer, the data analytics layer and open access analytics layer.

It also conceptualises analytics practices carried out within individual repository boundaries as core practices for the realisation of open access analytics and examines how the repository management team can participate in the open access analytics process.

To understand this, expert interviews were carried out to investigate and understand the analytics practices within the repository boundaries and the repository management teams’ interactions with analytics applications that are fed by the open access repository or used by repository management to operate open access analytics.

The interviews provide insight into the variations in the types of analytic practices and highlight the active role played by the repository management team in these practices. Thus, it provides an understanding of the analytics practices within open access repositories by classifying them into two main categories: the distributed analytical applications and locally operated analytics.

The distributed analytics application includes cross repository OAI-based analytics, cross-repository usage data aggregators, solo-repository content-centric analytics and solo-repository centric analytics.

On the other hand, the locally operated analytics take forms of Current Research Information System (CRIS),repository embedded functionalities and in-house developed analytics. It also classifies the repository management interactions with analytics into four roles: data analyst, administrative, data and system management, and system development and support.

Lastly, it raises concerns associated with the application of analytics on open access repositories, including data-related, cost-related and analytical concerns.

URL : http://eprints.soton.ac.uk/id/eprint/447464

Collaborative Data Literacy Education for Research Labs: A Case Study at a Large Research University

Authors : John Watts, Laura Sare, David E. Hubbard

Data literacy education for graduate students can take place in many contexts. One-shot instruction sessions and credit-bearing courses are a common mode of instruction for the graduate student audience, but both share limitations regarding best practices for adult learning theory.

This case study explores the benefits of data literacy education in a research lab setting and highlights the collaborations among data librarians, a liaison librarian, and research faculty that enable effective learning experiences in labs or other applied settings.

The authors share the design of the curriculum, facilitation of the instruction, and the assessment of student learning, as well as their approach to collaboration as an essential component of the project.

URL : Collaborative Data Literacy Education for Research Labs: A Case Study at a Large Research University

Original location : https://digitalcommons.du.edu/collaborativelibrarianship/vol12/iss3/7

Working with publication technology to make open access journals sustainable

Authors : Marcel Wrzesinski, Patrick Urs Riechert, Frédéric Dubois, Christian Katzenbach

Over the last 25 years, scholars around the world have used electronic publishing to open up their work, share it with interested publics instantly or even become publishers themselves.

This white paper explores in what ways advances in publication technology in the journal sector (e.g. the widespread use of content management and editorial systems) contributes to a more inclusive and sustainable open access ecosystem.

Drawing on a study we did in Germany in 2019-2021, and for which we tested technical solutions together with the international, peer-reviewed diamond open access journal Internet Policy Review, we present and discuss publishing solutions based on software, workflows, and collaborations with regard to their practicability and scalability.

The paper finds that scholar-led publishing is a force to be reckoned with when it comes to technical solutions tending towards increased bibliodiversity (i.e., variety of content, publication formats and publishing institutions).

URL : Working with publication technology to make open access journals sustainable

DOI : http://dx.doi.org/10.5281/zenodo.4558781

Open Access and Academic Freedom: Teasing Out Some Important Nuances

Author : Rick Anderson

Discussion of the ways in which open access (OA) and academic freedom interact is fraught for a number of reasons, not least of which is the unwillingness of some participants in the discussion to acknowledge that OA might have any implications for academic freedom at all. Thus, any treatment of such implications must begin with foundational questions.

Most basic among them are: first, what do we mean by ‘open access’; second, what do we mean by ‘academic freedom’? The answers to these questions are not as obvious as one might expect (or hope), but when they are answered it becomes much easier to address a third, also very important, question: in what ways might OA and academic freedom interact?

With every new OA mandate imposed by a government agency, institution of higher education, or funding organization, careful analysis of this issue becomes more urgent. This article attempts to sort out some of these issues, controversies and confusions.

DOI : https://doi.org/10.1111/dech.12636

Openness in Big Data and Data Repositories. The Application of an Ethics Framework for Big Data in Healthand Research

Authors : Vicki Xafis, Markus K. Labude

There is a growing expectation, or even requirement, for researchers to deposit a variety of research data in data repositories as a condition of funding or publication. This expectation recognizes the enormous benefits of data collected and created for research purposes being made available for secondary uses, as open science gains increasing support.

This is particularly so in the context of big data, especially where health data is involved. There are, however, also challenges relating to the collection, storage, and re-use of research data.

This paper gives a brief overview of the landscape of data sharing via data repositories and discusses some of the key ethical issues raised by the sharing of health-related research data, including expectations of privacy and confidentiality, the transparency of repository governance structures, access restrictions, as well as data ownership and the fair attribution of credit.

To consider these issues and the values that are pertinent, the paper applies the deliberative balancing approach articulated in the Ethics Framework for Big Data in Health and Research (Xafis et al. 2019) to the domain of Openness in Big Data and Data Repositories.

Please refer to that article for more information on how this framework is to be used, including a full explanation of the key values involved and the balancing approach used in the case study at the end.

URL : Openness in Big Data and Data Repositories. The Application of an Ethics Framework for Big Data in Healthand Research

DOI : https://doi.org/10.1007/s41649-019-00097-z