Increasing the Reuse of Data through FAIR-enabling the Certification of Trustworthy Digital Repositories

Authors : Benjamin Jacob Mathers, Hervé L’Hours

The long-term preservation of digital objects, and the means by which they can be reused, are addressed by both the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable) and a number of standards bodies providing Trustworthy Digital Repository (TDR) certification, such as the CoreTrustSeal.

Though many of the requirements listed in the Core Trustworthy Data Repositories Requirements 2020–2022 Extended Guidance address the FAIR Data Principles indirectly, there is currently no formal ‘FAIR Certification’ offered by the CoreTrustSeal or other TDR standards bodies. To address this gap the FAIRsFAIR project developed a number of tools and resources that facilitate the assessment of FAIR-enabling practices at the repository level as well as the FAIRness of datasets within them.

These include the CoreTrustSeal+FAIRenabling Capability Maturity model (CTS+FAIR CapMat), a FAIR-Enabling Trustworthy Digital Repositories-Capability Maturity Self-Assessment template, and F-UJI ,  a web-based tool designed to assess the FAIRness of research data objects.

The success of such tools and resources ultimately depends upon community uptake. This requires a community-wide commitment to develop best practices to increase the reuse of data and to reach consensus on what these practices are.

One possible way of achieving community consensus would be through the creation of a network of FAIR-enabling TDRs, as proposed by FAIRsFAIR.

URL : Increasing the Reuse of Data through FAIR-enabling the Certification of Trustworthy Digital Repositories

DOI : https://doi.org/10.2218/ijdc.v17i1.852

Caching and Reproducibility: Making Data Science Experiments Faster and FAIRer

Authors : Moritz Schubotz, Ankit Satpute, André Greiner-Petter, Akiko Aizawa, Bela Gipp

Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access.

The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, others cannot reproduce the experiment and reuse the findings for subsequent research. Second, suppose the ad-hoc research software fails during often long-running computational expensive experiments.

In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers. We suggest making caching an integral part of the research software development process, even before the first line of code is written.

This article outlines caching recommendations for developing research software in data science projects. Our recommendations provide a perspective to circumvent common problems such as propriety dependence, speed, etc. At the same time, caching contributes to the reproducibility of experiments in the open science workflow.

Concerning the four guiding principles, i.e., Findability, Accessibility, Interoperability, and Reusability (FAIR), we foresee that including the proposed recommendation in a research software development will make the data related to that software FAIRer for both machines and humans.

We exhibit the usefulness of some of the proposed recommendations on our recently completed research software project in mathematical information retrieval.

URL : Caching and Reproducibility: Making Data Science Experiments Faster and FAIRer

DOI : https://doi.org/10.3389/frma.2022.861944

FAIR Forever? Accountabilities and Responsibilities in the Preservation of Research Data

Author : Amy Currie, William Kilbride

Digital preservation is a fast-moving and growing community of practice of ubiquitous relevance, but in which capability is unevenly distributed. Within the open science and research data communities, digital preservation has a close alignment to the FAIR principles and is delivered through a complex specialist infrastructure comprising technology, staff and policy.

However, capacity erodes quickly, establishing a need for ongoing examination and review to ensure that skills, technology, and policy remain fit for changing purpose. To address this challenge, the Digital Preservation Coalition (DPC) conducted the FAIR Forever study, commissioned by the European Open Science Cloud (EOSC) Sustainability Working Group and funded by the EOSC Secretariat Project in 2020, to assess the current strengths, weaknesses, opportunities and threats to the preservation of research data across EOSC, and the feasibility of establishing shared approaches, workflows and services that would benefit EOSC stakeholders.

This paper draws from the FAIR Forever study to document and explore its key findings on the identified strengths, weaknesses, opportunities, and threats to the preservation of FAIR data in EOSC, and to the preservation of research data more broadly.

It begins with background of the study and an overview of the methodology employed, which involved a desk-based assessment of the emerging EOSC vision, interviews with representatives of EOSC stakeholders, and focus groups with digital preservation specialists and data managers in research organizations.

It summarizes key findings on the need for clarity on digital preservation in the EOSC vision and for elucidation of roles, responsibilities, and accountabilities to mitigate risks of data loss, reputation, and sustainability. It then outlines the recommendations provided in the final report presented to the EOSC Sustainability Working Group.

To better ensure that research data can be FAIRer for longer, the recommendations of the study are presented with discussion on how they can be extended and applied to various research data stakeholders in and outside of EOSC, and suggest ways to bring together research data curation, management, and preservation communities to better ensure FAIRness now and in the long term.

URL : FAIR Forever? Accountabilities and Responsibilities in the Preservation of Research Data

DOI : https://doi.org/10.2218/ijdc.v16i1.768

Do I-PASS for FAIR? Measuring the FAIR-ness of Research Organizations

Authors : Jacquelijn Ringersma, Margriet Miedema

Given the increased use of the FAIR acronym as adjective for other contexts than data or data sets, the Dutch National Coordination Point for Research Data Management initiated a Task Group to work out the concept of a FAIR research organization.

The results of this Task Groups are a definition of a FAIR enabling organization and a method to measure the FAIR-ness of a research organization (The Do-I-PASS for FAIR method). The method can also aid in developing FAIR-enabling Road Maps for individual research institutions and at a national level.

This practice paper describes the development of the method and provides a couple of use cases for the application of the method in daily research data management practices in research organizations.

URL : Do I-PASS for FAIR? Measuring the FAIR-ness of Research Organizations

DOI : http://doi.org/10.5334/dsj-2021-030

Repository Approaches to Improving the Quality of Shared Data and Code

Authors : Ana Trisovic, Katherine Mika, Ceilyn Boyd, Sebastian Feger, Mercè Crosas

Sharing data and code for reuse has become increasingly important in scientific work over the past decade. However, in practice, shared data and code may be unusable, or published results obtained from them may be irreproducible.

Data repository features and services contribute significantly to the quality, longevity, and reusability of datasets.

This paper presents a combination of original and secondary data analysis studies focusing on computational reproducibility, data curation, and gamified design elements that can be employed to indicate and improve the quality of shared data and code.

The findings of these studies are sorted into three approaches that can be valuable to data repositories, archives, and other research dissemination platforms.

URL : Repository Approaches to Improving the Quality of Shared Data and Code

DOI : https://doi.org/10.3390/data6020015

From Conceptualization to Implementation: FAIR Assessment of Research Data Objects

Authors: Anusuriya Devaraju, Mustapha Mokrane, Linas Cepinskas, Robert Huber, Patricia Herterich, Jerry de Vries, Vesa Akerman, Hervé L’Hours, Joy Davidson, Michael Diepenbroek

Funders and policy makers have strongly recommended the uptake of the FAIR principles in scientific data management. Several initiatives are working on the implementation of the principles and standardized applications to systematically evaluate data FAIRness.

This paper presents practical solutions, namely metrics and tools, developed by the FAIRsFAIR project to pilot the FAIR assessment of research data objects in trustworthy data repositories. The metrics are mainly built on the indicators developed by the RDA FAIR Data Maturity Model Working Group.

The tools’ design and evaluation followed an iterative process. We present two applications of the metrics: an awareness-raising self-assessment tool and an automated FAIR data assessment tool.

Initial results of testing the tools with researchers and data repositories are discussed, and future improvements suggested including the next steps to enable FAIR data assessment in the broader research data ecosystem.

URL : From Conceptualization to Implementation: FAIR Assessment of Research Data Objects

DOI : http://doi.org/10.5334/dsj-2021-004

Open Data Challenges in Climate Science

Authors : Francesca Eggleton, Kate Winfiel

The purpose of this paper is to explore challenges in open climate data experienced by data scientists at the Centre for Environmental Data Analysis (CEDA). This paper explores two of the five V’s of Big Data, Volume and Variety.

These challenges are explored using the Sentinel satellite data and Climate Modelling Intercomparison Project phase six (CMIP6) data held in the CEDA Archive. To address the Big Data Volume challenge, this paper describes the approach developed by CEDA to manage large volumes of data through the allocation of storage as filesets.

These filesets allow CEDA to plan and track dataset storage volumes, a flexible approach which could be adopted by any data centre. CEDA utilise the implementation of the Climate and Forecast (CF) conventions and standard names within archived data wherever possible to overcome the challenge of Variety.

Collaboration from the international science community through contributions to the moderation of CF standard names ensures these data then adhere to the FAIR (Findable, Accessible, Interoperable and Reusable) data principles.

Utilising data standards such as the CF standard names is recommended because it promotes data exchange and allows data from different sources to be compared. Addressing these Open Data challenges is crucial to ensure valuable climate data are made available to the scientific community to facilitate research that addresses one of society’s most pressing issues – climate change.

URL : Open Data Challenges in Climate Science

DOI : http://doi.org/10.5334/dsj-2020-052