Data Management and Preservation Planning for Big Science :
« ‘Big Science’ – that is, science which involves large collaborations with dedicated facilities, and involving large data volumes and multinational investments – is often seen as different when it comes to data management and preservation planning. Big Science handles its data differently from other disciplines and has data management problems that are qualitatively different from other disciplines. In part, these differences arise from the quantities of data involved, but possibly more importantly from the cultural, organisational and technical distinctiveness of these academic cultures. Consequently, the data management systems are typically and rationally bespoke, but this means that the planning for data management and preservation (DMP) must also be bespoke.
These differences are such that ‘just read and implement the OAIS specification’ is reasonable Data Management and Preservation (DMP) advice, but this bald prescription can and should be usefully supported by a methodological ‘toolkit’, including overviews, case-studies and costing models to provide guidance on developing best practice in DMP policy and infrastructure for these projects, as well as considering OAIS validation, audit and cost modelling.
In this paper, we build on previous work with the LIGO collaboration to consider the role of DMP planning within these big science scenarios, and discuss how to apply current best practice. We discuss the result of the MaRDI-Gross project (Managing Research Data Infrastructures – Big Science), which has been developing a toolkit to provide guidelines on the application of best practice in DMP planning within big science projects. This is targeted primarily at projects’ engineering managers, but intending also to help funders collaborate on DMP plans which satisfy the requirements imposed on them. »
An activity-based costing model for long-term preservation and dissemination of digital research data: the case of DANS :
« Financial sustainability is an important attribute of a trusted, reliable digital repository. The authors of this paper use the case study approach to develop an activity-based costing (ABC) model. This is used for estimating the costs of preserving digital research data and identifying options for improving and sustaining relevant activities. The model is designed in the environment of the Data Archiving and Networked Services (DANS) institute, a well-known trusted repository. The DANS–ABC model has been tested on empirical cost data from activities performed by 51 employees in frames of over 40 different national and international projects. Costs of resources are being assigned to cost objects through activities and cost drivers. The ‘euros per dataset’ unit of costs measurement is introduced to analyse the outputs of the model. Funders, managers and other decision-making stakeholders are being provided with understandable information connected to the strategic goals of the organisation. The latter is being achieved by linking the DANS–ABC model to another widely used managerial tool—the Balanced Scorecard (BSC). The DANS–ABC model supports costing of services provided by a data archive, while the combination of the DANS–ABC with a BSC identifies areas in the digital preservation process where efficiency improvements are possible. »
Step by step installation guide of a digital preservation infrastructure :
« The Ceris-CNR project of digital preservation infrastructure has been committed by Bess (Social Science Electronic Library of Piemonte) for years 2011-2012 sponsored by Compagnia di San Paolo of Turin. Ceris-CNR role is to handle all the post-scan of the digitalization, for this purpose it has deployed the software and server platforms of the repository and also the web portal for the presentation, research and consulting. This report is a guide of step by step followed to build the digital archive infrastructure. »
Preservation Status Of E-Resources: A Potential Crisis In Electronic Journal Preservation :
« E-journals have replaced the majority of titles formerly produced in paper format. Academic libraries are increasingly dependent on commercially produced, born-digital content that is purchased or licensed. The purpose of this presentation is to share the findings of a 2CUL study that assesses the role of LOCKSS and PORTICO in preserving each institution’s e-journal collections. The 2CUL initiative is a collaboration between Columbia University Library (CUL) and Cornell University Library (CUL) to join forces in providing content, expertise, and services that are impossible to accomplish acting alone.
Although LOCKSS is considered a successful digital preservation initiative, neither of the CULs felt that they fully understood the potential of the system for their own settings and collections. In support of this goal, a joint team was established in November 2010 to investigate various questions to assess how LOCKSS is being deployed and the implications of local practices for both CUL’s preservation frameworks. This study was seen as a high-level investigation to characterize the general landscape and identify further research questions. One of the practical outcomes was a comparative analysis of Portico and LOCKSS preservation coverage for Columbia and Cornell’s serial holdings. A key finding was that only 15-20% of the e-journal titles in the libraries’ collections are currently preserved by these two initiatives. Further analysis suggests the remaining titles fall into roughly 10 categories, with a variety of strategies needed to ensure their preservation. »
Institutional Repositories, Long Term Preservation and the changing nature of Scholarly Publications :
« The web offers new opportunities for scholars to publish the outcome of their research. One of these new forms is called Enhanced Publications. In an Enhanced Publication different objects and files that has a meaningful and close relation to each other are aggregated on the level of a resource map in witch not only the separate files are described, but also the relation between those files are. An example of an Enhanced Publication is a digital text publication and a dataset on which the publication is based. Preserving these compound entities in the existing infrastructures raises new issues. This article discusses these issues against the background of the Dutch long term preservation infrastructure and organisation. »
Institutional Repositories and Digital Preservation: Assessing Current Practices at Research Libraries :
« In spring 2010, authors from the University of Massachusetts Amherst conducted a national survey on digital preservation of Institutional Repository (IR) materials among Association of Research Libraries (ARL) member institutions. Examining the current practices of digital preservation of IR materials, the survey of 72 research libraries reveals the challenges and opportunities of implementing digital preservation for IRs in a complex environment with rapidly evolving technology, practices, and standards. Findings from this survey will inform libraries about the current state of digital preservation for IRs. »
Preserving repository content: practical tools for repository managers :
« The stated aim of many repositories is to provide permanent open access to their content. However, relatively few repositories have implemented practical action plans towards permanence. Repository managers often lack time and confidence to tackle the important but scary problem of preservation.
Written by, and aimed at, repository managers, this paper describes how the JISC-funded KeepIt project has been bringing together existing preservation tools and services with appropriate training and advice to enable repository managers to formulate practical and achievable preservation plans.
Three elements of the KeepIt project are described:
1. The initial, exploratory phase in which repository managers and a preservation specialist established the current status of each repository and its preservation objectives;
2. The repository-specific KeepIt preservation training course which covered the organisational and financial framework of repository preservation; metadata; the new preservation tools; and issues of trust between repository, users and services;
3. The application of tools and lessons learned from the training course to four exemplar repositories and the impact that this has made.
The paper concludes by recommending practical steps that all repository managers may take to ensure their repositories are preservation-ready. »
Characterising and Preserving Digital Repositories: File Format Profiles :
« Steve Hitchcock and David Tarrant show how file format profiles, the starting point for preservation plans and actions, can also be used to reveal the fingerprints of emerging types of institutional repositories. »
The Dataverse Network®: An Open-Source Application for Sharing, Discovering and Preserving Data :
« The Dataverse Network is an open-source application for publishing, referencing, extracting and analyzing research data. The main goal of the Dataverse Network is to solve the problems of data sharing through building technologies that enable institutions to reduce the burden for researchers and data publishers, and incentivize them to share their data. By installing Dataverse Network software, an institution is able to host multiple individual virtual archives, called « dataverses » for scholars, research groups, or journals, providing a data publication framework that supports author recognition, persistent citation, data discovery and preservation. Dataverses require no hardware or software costs, nor maintenance or backups by the data owner, but still enable all web visibility and credit to devolve to the data owner. »
IISH Guidelines for preserving research data: a framework for preserving collaborative data collections for future research :
« Our guidelines highlight the iterative process of data collection, data processing, data analysis and publication of (interim) research results. The iterative process is best analyzed and illustrated by following the dynamics of data collection in online collaboratories. The production of data sets in such large scale data collection projects, typically takes a lot of time, whilst in the meantime research may already be performed on data sub-sets. If this leads to a publication a proper citation is required. Publishers and readers need to know exactly in what stage of the data collection process specific conclusions on these data were drawn. During this iterative process, research data need to be maintained, managed and disseminated in different forms and versions during the successive stages of the work carried out, in order to validate the outcomes and research results. These practices drive the requirements for data archiving and show that data archiving is not a once off data transfer transaction or even a linear process. Therefore from the perspective of the research process, we recommend the interconnection and interfacing between data collection and data archiving, in order to ensure the most effective and loss-less preservation of the research data. »