Kerchunk¶
Warning
Unsorted notes
kerchunk supports cloud-friendly access of data with specific reference to netCDF4/HDF5 files.1
How? Kerchunk
- extracts metadata in a single scan
- arranges multiple chunks from multiple files
- with dask and zarr, reads chunks in parallel and/or concurrently within a single indexible aggregate dataset
+ advantages¶
- supports parallel and concurrent reads
- memory efficiency
- parallel processing
- data locality
- drawbacks¶
- ?
How does it work?¶
- Combines
fsspec
,dask
, andzarr
- Reference file :
{
"version": 1,
"shapes": {"var1": [365, 24]},
"refs": {
"var1/0": ["file_1.nc", "0:24"],
"var1/1": ["file_2.nc", "0:24"],
// ...
}
}
-
Development supported by NASA fundung https://doi.org/10.6084/m9.figshare.22266433.v1 ↩
-
see Concurrency ↩