Authors : Francesca Eggleton, Kate Winfiel
The purpose of this paper is to explore challenges in open climate data experienced by data scientists at the Centre for Environmental Data Analysis (CEDA). This paper explores two of the five V’s of Big Data, Volume and Variety.
These challenges are explored using the Sentinel satellite data and Climate Modelling Intercomparison Project phase six (CMIP6) data held in the CEDA Archive. To address the Big Data Volume challenge, this paper describes the approach developed by CEDA to manage large volumes of data through the allocation of storage as filesets.
These filesets allow CEDA to plan and track dataset storage volumes, a flexible approach which could be adopted by any data centre. CEDA utilise the implementation of the Climate and Forecast (CF) conventions and standard names within archived data wherever possible to overcome the challenge of Variety.
Collaboration from the international science community through contributions to the moderation of CF standard names ensures these data then adhere to the FAIR (Findable, Accessible, Interoperable and Reusable) data principles.
Utilising data standards such as the CF standard names is recommended because it promotes data exchange and allows data from different sources to be compared. Addressing these Open Data challenges is crucial to ensure valuable climate data are made available to the scientific community to facilitate research that addresses one of society’s most pressing issues – climate change.