Skip to content

Warning

To sort out!

Context

First time questions

  • Data format?
  • Chunking strategy?
  • Reference file customization?

Background

Challenges with long-term observations

  • Size, Volume

    • example : half-hourly SARAH3 daily netCDF files are \(~150\) - \(180\) MB each
    • or \(10\) years \(\approx0.5\) TB
  • Format

  • Metadata extraction

  • Data access

    • [concurrent][] & [parallel][] access

Data models

  • Binary Buffers
full_array = read_entire_file("10_year_data.hdf5")
one_day_data = full_array[0:24]
  • HDF5

  • yet not cloud optimised

  • H5coro : cloud-optimised read-only library

HDF5 supports direct reading from cloud storage, whether over HTTP or by passing fsspec instances.