Institutional Data Repository Development, a Moving Target

Authors : Colleen Fallaw, Genevieve Schmitt, Hoa Luong, Jason Colwell, Jason Strutz

At the end of 2019, the Research Data Service (RDS) at the University of Illinois at Urbana-Champaign (UIUC) completed its fifth year as a campus-wide service. In order to gauge the effectiveness of the RDS in meeting the needs of Illinois researchers, RDS staff developed a five-year review consisting of a survey and a series of in-depth focus group interviews.

As a result, our institutional data repository developed in-house by University Library IT staff, Illinois Data Bank, was recognized as the most useful service offering by our unit. When launched in 2016, storage resources and web servers for Illinois Data Bank and supporting systems were hosted on-premises at UIUC.

As anticipated, researchers increasingly need to share large, and complex datasets. In a responsive effort to leverage the potentially more reliable, highly available, cost-effective, and scalable storage accessible to computation resources, we migrated our item bitstreams and web services to the cloud. Our efforts have met with success, but also with painful bumps along the way.

This article describes how we supported data curation workflows through transitioning from on-premises to cloud resource hosting. It details our approaches to ingesting, curating, and offering access to dataset files up to 2TB in size–which may be archive type files (e.g., .zip or .tar) containing complex directory structures.


Open access analytics with open access repository data: A Multi-level perspective

Author : Ibraheem Mohammed Sultan Al Sadi

Within nearly two decades after the open access movement emerged, its community has drawn attention to understanding its development, coverage, obstacles and motivations. To do so, they depend on data-centric analytics of open access publishing activities, using Web information space as their data sources for these analytical activities.

Open access repositories are one such data source that nurtures open access publishing activities and are a valuable source for analytics. Therefore, the open access community utilises open access repository infrastructure to develop and operate analytics, harnessing the widely adopted Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) interoperability layer to develop value-added services with an analytics agenda.

However,this layer presents its limitations and challenges regarding the support of analytical value-added services. To address these practices, this research has taken the step to consolidate these practices into the ‘open access analytics’ notion of drawing attention to its significance and bridge it with data analytics literature.

As part of this, an explanatory case study demonstrate show the OAI-PMH service provider approach supports open access analytics and also presents its limitations using Registry of Open Access Repositories (ROAR) analytics as a case study.

The case study reflects the limitation of open access registries to enable a single point of discovery due to the quality of their records and complexity of open access repositories taxonomy, the complexity of operationalising the unit of analysis in particular analytics due to the limitations in the OAI-PMH metadata schemes, the complex and resource-intensive harvesting process due to the large volume of data and the low quality of OAI-PMH standards adoptions and the issue of service provider suitability due to a single point of failure.

Also, this doctoral thesis proposes the use of Open Access Analytics using Open Access Repository Data with a Social Machine (OAA-OARD-SM) as a conceptual frame work to deliver open access analytics by using the open access repository infrastructure in acollaborative manner with social machines.

Furthermore, it takes advantage of the web observatory infrastructure as a form of web-based mediated technology to coordinate the open access analytics process. The conceptual framework re-frames the open access analytics process into four layers: the open access repository layer, the open access registry layer, the data analytics layer and open access analytics layer.

It also conceptualises analytics practices carried out within individual repository boundaries as core practices for the realisation of open access analytics and examines how the repository management team can participate in the open access analytics process.

To understand this, expert interviews were carried out to investigate and understand the analytics practices within the repository boundaries and the repository management teams’ interactions with analytics applications that are fed by the open access repository or used by repository management to operate open access analytics.

The interviews provide insight into the variations in the types of analytic practices and highlight the active role played by the repository management team in these practices. Thus, it provides an understanding of the analytics practices within open access repositories by classifying them into two main categories: the distributed analytical applications and locally operated analytics.

The distributed analytics application includes cross repository OAI-based analytics, cross-repository usage data aggregators, solo-repository content-centric analytics and solo-repository centric analytics.

On the other hand, the locally operated analytics take forms of Current Research Information System (CRIS),repository embedded functionalities and in-house developed analytics. It also classifies the repository management interactions with analytics into four roles: data analyst, administrative, data and system management, and system development and support.

Lastly, it raises concerns associated with the application of analytics on open access repositories, including data-related, cost-related and analytical concerns.


Managing an institutional repository workflow with GitLab and a folder-based deposit system

Authors : Whitney R. Johnson-Freeman, Mark E. Phillips, Kristy K. Phillips

Institutional Repositories (IR) exist in a variety of configurations and in various states of development across the country. Each organization with an IR has a workflow that can range from explicitly documented and codified sets of software and human workflows, to ad hoc assortments of methods for working with faculty to acquire, process and load items into a repository.

The University of North Texas (UNT) Libraries has managed an IR called UNT Scholarly Works for the past decade but has until recently relied on ad hoc workflows. Over the past six months, we have worked to improve our processes in a way that is extensible and flexible while also providing a clear workflow for our staff to process submitted and harvested content.

Our approach makes use of GitLab and its associated tools to track and communicate priorities for a multi-user team processing resources. We paired this Web-based management with a folder-based system for moving the deposited resources through a sequential set of processes that are necessary to describe, upload, and preserve the resource.

This strategy can be used in a number of different applications and can serve as a set of building blocks that can be configured in different ways. This article will discuss which components of GitLab are used together as tools for tracking deposits from faculty as they move through different steps in the workflow.

Likewise, the folder-based workflow queue will be presented and described as implemented at UNT, and examples for how we have used it in different situations will be presented.


Opening Up Open Access Institutional Repositories to Demonstrate Value: Two Universities’ Pilots on Including Metadata-Only Records

Authors: Karen Bjork, Rebel Cummings-Sauls, Ryan Otto


Institutional repository managers are continuously looking for new ways to demonstrate the value of their repositories. One way to do this is to create a more inclusive repository that provides reliable information about the research output produced by faculty affiliated with the institution.


This article details two pilot projects that evaluated how their repositories could track faculty research output through the inclusion of metadata-only (no full-text) records.

The purpose of each pilot project was to determine the feasibility and provide an assessment of the long-term impact on the repository’s mission statement, staffing, and collection development policies.


This article shares the results of the pilot project and explores the impact for faculty and end users as well as the implications for repositories.

URL : Opening Up Open Access Institutional Repositories to Demonstrate Value: Two Universities’ Pilots on Including Metadata-Only Records


A Principled Approach to Online Publication Listings and Scientific Resource Sharing

Authors : Jacquelijn Ringersma, Karin Kastens, Ulla Tschida, Jos van Berkum

The Max Planck Institute (MPI) for Psycholinguistics has developed a service to manage and present the scholarly output of their researchers. The PubMan database manages publication metadata and full-texts of publications published by their scholars.

All relevant information regarding a researcher’s work is brought together in this database, including supplementary materials and links to the MPI database for primary research data.

The PubMan metadata is harvested into the MPI website CMS (Plone). The system developed for the creation of the publication lists, allows the researcher to create a selection of the harvested data in a variety of formats.


Developing Infrastructure to Support Closer Collaboration of Aggregators with Open Repositories

The amount of open access content stored in repositories has increased dramatically, which has created new technical and organisational challenges for bringing this content together. The COnnecting REpositories (CORE) project has been dealing with these challenges by aggregating and enriching content from hundreds of open access repositories, increasing the discoverability and reusability of millions of open access manuscripts.

As repository managers and library directors often wish to know the details of the content harvested from their repositories and keep a certain level of control over it, CORE is now facing the challenge of how to enable content providers to manage their content in the aggregation and control the harvesting process. In order to improve the quality and transparency of the aggregation process and create a two-way collaboration between the CORE project and the content providers, we propose the CORE Dashboard.


Managing open access with EPrints software: a case study

Recent additional open access (OA) requirements for publications by authors at UK higher education institutions require amendments to support mechanisms. These additional requirements arose primarily from the Research Councils UK Open Access Policy, applicable from April 2013, and the new OA policy for Research Excellence Framework  eligibility published in March 2014 and applicable from April 2016.

Further provision also had to be made for compliance with the UK Charities Open Access Fund, the European Union, other funder policies, and internal reporting requirements.

In response, the University of Glasgow has enhanced its OA processes and systems. This case study charts our journey towards managing OA via our EPrints repository. The aim was to consolidate and manage OA information in one central place to increase efficiency of recording, tracking and reporting. We are delighted that considerable time savings and reduction in errors have been achieved by dispensing with spreadsheets to record decisions about OA.

URL : Managing open access with EPrints software: a case study