Authors : Elias Oltmanns, Tim Hasler, Wolfgang Peters-Kottig, Heinz-Günter Kuper
Ensuring the long-term availability of research data forms an integral part of data management services. Where OAIS compliant digital preservation has been established in recent years, in almost all cases the services aim at the preservation of file-based objects.
In the Digital Humanities, research data is often represented in highly structured aggregations, such as Scholarly Digital Editions. Naturally, scholars would like their editions to remain functionally complete as long as possible.
Besides standard components like webservers, the presentation typically relies on project specific code interacting with client software like webbrowsers. Especially the latter being subject to rapid change over time invariably makes such environments awkward to maintain once funding has ended.
Pragmatic approaches have to be found in order to balance the curation effort and the maintainability of access to research data over time. A sketch of four potential service levels aiming at the long-term availability of research data in the humanities is outlined: (1) Continuous Maintenance, (2) Application Conservation, (3) Application Data Preservation, and (4) Bitstream Preservation.
The first being too costly and the last hardly satisfactory in general, we suggest that the implementation of services by an infrastructure provider should concentrate on service levels 2 and 3. We explain their strengths and limitations considering the example of two Scholarly Digital Editions.
URL : Different Preservation Levels: The Case of Scholarly Digital Editions
DOI : http://doi.org/10.5334/dsj-2019-051
Authors : Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli
In the very broad scope addressed by digital preservation initiatives, a special place belongs to the scientific and technical artifacts that we need to properly archive to enable scientific reproducibility.
For these artifacts we need identifiers that are not only unique and persistent, but also support integrity in an intrinsic way. They must provide strong guarantees that the object denoted by a given identifier will always be the same, without relying on third parties and external administrative processes.
In this article, we report on our quest for this identifiers for digital objects (IDOs), whose properties are different from, and complementary to, those of the various digital identifiers of objects (DIOs) that are in widespread use today.
We argue that both kinds of identifiers are needed and present the framework for intrinsic persistent identifiers that we have adopted in Software Heritage for preserving billions of software artifacts.
URL : https://hal.archives-ouvertes.fr/hal-01865790
Authors: Kate Wittenberg, Sarah Glasser, Amy Kirchhoff, Sheila Morrissey, Stephanie Orphan
There has been tremendous growth in the amount of digital content created by libraries, publishers, cultural institutions and the general public. While there are great benefits to having content available in digital form, digital objects can be extremely short-lived unless proper attention is paid to preservation.
Reflecting on our experience with the digital preservation service Portico, we provide background on Portico’s history and evolving practice of sustainable preservation of the digital artifacts of scholarly communications.
We also provide an overview of the digital preservation landscape as we see it now, with some thoughts on current requirements for preservation, and thoughts on the opportunities and challenges that lie ahead.
URL : Challenges and opportunities in the evolving digital preservation landscape: reflections from Portico
DOI : http://doi.org/10.1629/uksg.421
Authors : Helena Francke, Jonas Gamalielsson, Björn Lundell
The study describes the conditions for long-term preservation of the content of the institutional repositories of Swedish higher education institutions based on an investigation of how deposited files are managed with regards to file format and how representatives of the repositories describe the functions of the repositories.
The findings are based on answers to a questionnaire completed by thirty-four institutional repository representatives (97% response rate).
Questionnaire answers were analysed through descriptive statistics and qualitative coding. The concept of information infrastructures was used to analytically discuss repository work.
Visibility and access to content were considered to be the most important functions of the repositories, but long-term preservation was also considered important for publications and student theses.
Whereas a majority of repositories had some form of guidelines for which file formats were accepted, very few considered whether or not file formats constitute open standards. This can have consequences for the long-term sustainability and access of the content deposited in the repositories.
The study contributes to the discussion about the sustainability of research publications and data in the repositories by pointing to the potential difficulties involved for long-term preservation and access when there is little focus on and awareness of open file formats.
URL : http://www.informationr.net/ir/22-2/paper757.html
Authors : Monika Linne, Wolfgang Zenk-Möltgen
In the German social and economic sciences there is a growing awareness of flexible data distribution and research data reuse, especially as increasing numbers of research funders recommend publishing research data as the basis for scientific insight.
However, a data-sharing mentality has not yet been established in Germany attributable to researchers’ strong reservations about publishing their data.
This attitude is exacerbated by the fact that, at present, there is no trusted national data sharing repository that covers the particular requirements of institutions regarding research data.
This article discusses how this objective can be achieved with the project initiative SowiDataNet.
The development of a community-driven data repository is a logically consistent and important step towards an attitude shift concerning data sharing in the social and economic sciences.
DOI : http://doi.org/10.18352/lq.10195
‘Big Science’ – that is, science which involves large collaborations with dedicated facilities, and involving large data volumes and multinational investments – is often seen as different when it comes to data management and preservation planning.
Big Science handles its data differently from other disciplines and has data management problems that are qualitatively different from other disciplines. In part, these differences arise from the quantities of data involved, but possibly more importantly from the cultural, organisational and technical distinctiveness of these academic cultures.
Consequently, the data management systems are typically and rationally bespoke, but this means that the planning for data management and preservation (DMP) must also be bespoke.
These differences are such that ‘just read and implement the OAIS specification’ is reasonable Data Management and Preservation (DMP) advice, but this bald prescription can and should be usefully supported by a methodological ‘toolkit’, including overviews, case-studies and costing models to provide guidance on developing best practice in DMP policy and infrastructure for these projects, as well as considering OAIS validation, audit and cost modelling.
In this paper, we build on previous work with the LIGO collaboration to consider the role of DMP planning within these big science scenarios, and discuss how to apply current best practice.
We discuss the result of the MaRDI-Gross project (Managing Research Data Infrastructures – Big Science), which has been developing a toolkit to provide guidelines on the application of best practice in DMP planning within big science projects.
This is targeted primarily at projects’ engineering managers, but intending also to help funders collaborate on DMP plans which satisfy the requirements imposed on them.
URL : http://www.ijdc.net/index.php/ijdc/article/view/8.1.29
An activity-based costing model for long-term preservation and dissemination of digital research data: the case of DANS :
“Financial sustainability is an important attribute of a trusted, reliable digital repository. The authors of this paper use the case study approach to develop an activity-based costing (ABC) model. This is used for estimating the costs of preserving digital research data and identifying options for improving and sustaining relevant activities. The model is designed in the environment of the Data Archiving and Networked Services (DANS) institute, a well-known trusted repository. The DANS–ABC model has been tested on empirical cost data from activities performed by 51 employees in frames of over 40 different national and international projects. Costs of resources are being assigned to cost objects through activities and cost drivers. The ‘euros per dataset’ unit of costs measurement is introduced to analyse the outputs of the model. Funders, managers and other decision-making stakeholders are being provided with understandable information connected to the strategic goals of the organisation. The latter is being achieved by linking the DANS–ABC model to another widely used managerial tool—the Balanced Scorecard (BSC). The DANS–ABC model supports costing of services provided by a data archive, while the combination of the DANS–ABC with a BSC identifies areas in the digital preservation process where efficiency improvements are possible.”
URL : http://link.springer.com/article/10.1007/s00799-012-0092-1