Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles

Author : Martin Paul Eve

Introduction

Digital preservation underpins the persistence of scholarly links and citations through the digital object identifier (DOI) system. We do not currently know, at scale, the extent to which articles assigned a DOI are adequately preserved.

Methods

We construct a database of preservation information from original archival sources and then examine the preservation statuses of 7,438,037 DOIs in a random sample.

Results

Of the 7,438,037 works examined, there were 5.9 million copies spread over the archives used in this work. Furthermore, a total of 4,342,368 of the works that we studied (58.38%) were present in at least one archive. However, this left 2,056,492 works in our sample (27.64%) that are seemingly unpreserved.

The remaining 13.98% of works in the sample were excluded either for being too recent (published in the current year), not being journal articles, or having insufficient date metadata for us to identify the source.

Discussion

Our study is limited by design in several ways. Among these are the facts that it uses only a subset of archives, it only tracks articles with DOIs, and it does not account for institutional repository coverage. Nonetheless, as an initial attempt to gauge the landscape, our results will still be of interest to libraries, publishers, and researchers.

Conclusion

This work reveals an alarming preservation deficit. Only 0.96% of Crossref members (n = 204) can be confirmed to digitally preserve over 75% of their content in three or more of the archives that we studied. (Note that when, in this article, we write “preserved,” we mean “that we were able to confirm as preserved,” as per the specified limitations of this study.) A slightly larger proportion, i.e., 8.5% (n = 1,797), preserved over 50% of their content in two or more archives.

However, many members, i.e., 57.7% (n = 12,257), only met the threshold of having 25% of their material in a single archive. Most worryingly, 32.9% (n = 6,982) of Crossref members seem not to have any adequate digital preservation in place, which is against the recommendations of the Digital Preservation Coalition.

URL : Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles

DOI : https://doi.org/10.31274/jlsc.16288

The status of the digital preservation policies and plans of the institutional repositories of selected public universities in Kenya

Authors : Hellen Ndegwa, Emily Bosire, Damaris Odero

Institutional repositories (IRs) have a leading role in providing long-term access to the research output of universities. This study assessed the capabilities of institutional repositories in Kenya to support long-term preservation of digital content by reviewing digital preservation policies and plans.

Data was collected through face-to-face interviews from 19 respondents drawn from three public universities that were identified by their registration in OpenDOAR, ROARMAP and the number of items in their repositories.

Additional data was acquired through analysis of documents such as open access policies and mandates, as well as institutional websites. Findings revealed that the organizations were poorly prepared to support long-term digital preservation.

Policies were inadequate and plans to support the implementation of the policies were lacking. The study concluded that although the IRs were to undertake digital preservation, they lacked clearly defined actions from plans and policy.

This article offers recommendations, including identifying digital preservation goals that will guide policy formulation and multi-stakeholder involvement in the policy-making process. Effort should also be made to create awareness of the relationship between digital content selection and its successful long-term preservation.

URL : The status of the digital preservation policies and plans of the institutional repositories of selected public universities in Kenya

DOI : http://doi.org/10.1629/uksg.590

Evidence for Trusted Digital Repository Reviews: An Analysis of Perspectives

Author : Jonathan David Crabtree

Building trust in our research infrastructure is important for the future of the academy. Trust in research data repositories is critical as they provide the evidence for past discoveries as well as the input for future discoveries.

Archives and repositories are examining their options for trustworthy review, audit, and certification as a means to build trust within their content creator and user communities. One option these institutions have is to increase and demonstrate their trustworthiness is to apply for the CoreTrustSeal.

Applicants for the CoreTrustSeal are becoming more numerous and diverse, ranging general purpose repositories, preservation infrastructure providers, and domain repositories. This demand for certification and the subjective nature of decisions around levels of CORETrustSeal compliance drives this dissertation.

It is a study of the review process and its veracity and consistency in determining the trustworthiness of applicant repositories. Several assumptions underlie this work. First, audits and reviews must be based on evidence supplied by the repository under scrutiny; second, and not all reviewers will approach a piece of evidence in the same fashion or give it the same weight. Third, the value and veracity of required evidence may be subject to reviewers’ diverse perspectives and diverse repository community norms.

This research used a thematic qualitative analysis approach to identify similarities and differences in CoreTrustSeal reviewers’ responses during semi-structured interviews in order to better understand potential subjective differences among respondents. The participants’ non-probabilistic sample represented a balance in perspectives across three anticipated categories: administrator, archivist, and technologist.

Themes converged around several key concepts. Nearly all participants felt they were performing a peer review process and working to help the repository community and the research enterprise.

Reviewers were questioned about the various CoreTrustSeal application requirements and which ones they felt were the most important. No clear evidence emerged to indicate that variations in perspectives affected the subjective review of application evidence. The same categories of evidence were often selected and identified as being critical across all three categories (i.e., administrator, archivist, and technologist).

Many valuable suggestions from participants were recorded and can be implemented to ensure the consistency and sustainability of this trusted repository review process.

These suggestions and concepts were also very evenly distributed across the three perspectives. The balance in perspectives is potentially due to participants’ experience levels and their years of experience in various positions, holding many responsibilities, within the organizations they represented.

DOI : https://doi.org/10.17615/npck-km73

Different Preservation Levels: The Case of Scholarly Digital Editions

Authors : Elias Oltmanns, Tim Hasler, Wolfgang Peters-Kottig, Heinz-Günter Kuper

Ensuring the long-term availability of research data forms an integral part of data management services. Where OAIS compliant digital preservation has been established in recent years, in almost all cases the services aim at the preservation of file-based objects.

In the Digital Humanities, research data is often represented in highly structured aggregations, such as Scholarly Digital Editions. Naturally, scholars would like their editions to remain functionally complete as long as possible.

Besides standard components like webservers, the presentation typically relies on project specific code interacting with client software like webbrowsers. Especially the latter being subject to rapid change over time invariably makes such environments awkward to maintain once funding has ended.

Pragmatic approaches have to be found in order to balance the curation effort and the maintainability of access to research data over time. A sketch of four potential service levels aiming at the long-term availability of research data in the humanities is outlined: (1) Continuous Maintenance, (2) Application Conservation, (3) Application Data Preservation, and (4) Bitstream Preservation.

The first being too costly and the last hardly satisfactory in general, we suggest that the implementation of services by an infrastructure provider should concentrate on service levels 2 and 3. We explain their strengths and limitations considering the example of two Scholarly Digital Editions.

URL : Different Preservation Levels: The Case of Scholarly Digital Editions

DOI : http://doi.org/10.5334/dsj-2019-051

Identifiers for Digital Objects: the Case of Software Source Code Preservation

Authors : Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli

In the very broad scope addressed by digital preservation initiatives, a special place belongs to the scientific and technical artifacts that we need to properly archive to enable scientific reproducibility.

For these artifacts we need identifiers that are not only unique and persistent, but also support integrity in an intrinsic way. They must provide strong guarantees that the object denoted by a given identifier will always be the same, without relying on third parties and external administrative processes.

In this article, we report on our quest for this identifiers for digital objects (IDOs), whose properties are different from, and complementary to, those of the various digital identifiers of objects (DIOs) that are in widespread use today.

We argue that both kinds of identifiers are needed and present the framework for intrinsic persistent identifiers that we have adopted in Software Heritage for preserving billions of software artifacts.

URL : https://hal.archives-ouvertes.fr/hal-01865790

Challenges and opportunities in the evolving digital preservation landscape: reflections from Portico

Authors: Kate Wittenberg, Sarah Glasser, Amy Kirchhoff, Sheila Morrissey, Stephanie Orphan

There has been tremendous growth in the amount of digital content created by libraries, publishers, cultural institutions and the general public. While there are great benefits to having content available in digital form, digital objects can be extremely short-lived unless proper attention is paid to preservation.

Reflecting on our experience with the digital preservation service Portico, we provide background on Portico’s history and evolving practice of sustainable preservation of the digital artifacts of scholarly communications.

We also provide an overview of the digital preservation landscape as we see it now, with some thoughts on current requirements for preservation, and thoughts on the opportunities and challenges that lie ahead.

URL : Challenges and opportunities in the evolving digital preservation landscape: reflections from Portico

DOI : http://doi.org/10.1629/uksg.421

Institutional repositories as infrastructures for long-term preservation

Authors : Helena Francke, Jonas Gamalielsson, Björn Lundell

Introduction

The study describes the conditions for long-term preservation of the content of the institutional repositories of Swedish higher education institutions based on an investigation of how deposited files are managed with regards to file format and how representatives of the repositories describe the functions of the repositories.

Method

The findings are based on answers to a questionnaire completed by thirty-four institutional repository representatives (97% response rate).

Analysis

Questionnaire answers were analysed through descriptive statistics and qualitative coding. The concept of information infrastructures was used to analytically discuss repository work.

Results

Visibility and access to content were considered to be the most important functions of the repositories, but long-term preservation was also considered important for publications and student theses.

Whereas a majority of repositories had some form of guidelines for which file formats were accepted, very few considered whether or not file formats constitute open standards. This can have consequences for the long-term sustainability and access of the content deposited in the repositories.

Conclusion

The study contributes to the discussion about the sustainability of research publications and data in the repositories by pointing to the potential difficulties involved for long-term preservation and access when there is little focus on and awareness of open file formats.

URL : http://www.informationr.net/ir/22-2/paper757.html