Warning
To sort out!
Context¶
First time questions¶
- Data format?
- Chunking strategy?
- Reference file customization?
Background¶
Challenges with long-term observations¶
-
Size, Volume
- example : half-hourly SARAH3 daily netCDF files are \(~150\) - \(180\) MB each
- or \(10\) years \(\approx0.5\) TB
-
Format
-
Metadata extraction
-
Data access
- [concurrent][] & [parallel][] access
Data models¶
- Binary Buffers
-
HDF5
-
yet not cloud optimised
- H5coro : cloud-optimised read-only library
HDF5 supports direct reading from cloud storage, whether over HTTP or by passing fsspec instances.