Skip to content

Kerchunking to JSON

Example data

Let us work with the following example files from the SARAH3 climate data records

cd data/multiple_files_unique_shape/
ls -1
SISin202001010000004231000101MA.nc
SISin202001010000004231000101MA_structure.csv
SISin202001020000004231000101MA.nc

Check for consistency

In order to create a Kerchunk reference, all datasets need to be identically shaped in terms of chunk sizes! Thus, let us confirm this is the case with our sample data :

rekx shapes . --validate-consistency
✓ Variables are consistently shaped across all files!

Or we can add -v to report the shapes for each variable :

rekx shapes . --validate-consistency -v
✓ Variables are consistently shaped across all files!

Reference input files

We can proceed to create a JSON Kerchunk reference for each of the input NetCDF files. However, before producing any new file, let's --dry-run the command in question to see what will happen :

rekx reference . sarah3_sis_kerchunk_references_json -v --dry-run
Dry run of operations that would be performed:
> Reading files in . matching the pattern *.nc
> Number of files matched: 2
> Creating single reference files to sarah3_sis_kerchunk_references_json

--dry-run is quite useful -- we need some indication things are right before engaging with real massive processing!

This looks okay, so let's give it a real go

rekx reference . sarah3_sis_kerchunk_references_json -v

Note that Kerchunking processes run in parallel!

The output of the above command is

tree sarah3_sis_kerchunk_references_json/
sarah3_sis_kerchunk_references_json/
├── SISin202001010000004231000101MA.json
├── SISin202001010000004231000101MA.json.hash
├── SISin202001020000004231000101MA.json
└── SISin202001020000004231000101MA.json.hash

0 directories, 4 files

Aggregate references

Next, we want to cobine the single references into one file. Let's dry-run the combine command :

rekx combine sarah3_sis_kerchunk_references_json sarah3_sis_kerchunk_reference_json --dry-run
Dry run of operations that would be performed:
> Reading files in sarah3_sis_kerchunk_references_json matching the pattern 
*.json
> Number of files matched: 2
> Writing combined reference file to sarah3_sis_kerchunk_reference_json

Warning

In the above example not the subtle name difference : sarah3_sis_kerchunk_references_json != sarah3_sis_kerchunk_reference_json

This also looks fine. So let's create the single reference file

rekx combine sarah3_sis_kerchunk_references_json sarah3_sis_kerchunk_reference_json -v

The file sarah3_sis_kerchunk_reference_json has been created an seems to be a valid one

file sarah3_sis_kerchunk_reference_json
sarah3_sis_kerchunk_reference_json: ASCII text, with very long lines (65536), with no line terminators

Test it!

Let's try to retrieve data over a geographic location though

rekx select-json sarah3_sis_kerchunk_reference_json SIS 8 45 --neighbor-lookup nearest -v
✓ Coordinates : 8.0, 45.0.
<xarray.DataArray 'SIS' (time: 96)>
[96 values with dtype=int16]
Coordinates:
    lat      float32 45.03
    lon      float32 8.025
  * time     (time) datetime64[ns] 2020-01-01 ... 2020-01-02T23:30:00
Attributes:
    cell_methods:   time: point
    long_name:      Surface Downwelling Shortwave Radiation
    missing_value:  -999
    standard_name:  surface_downwelling_shortwave_flux_in_air
    units:          W m-2
    _FillValue:     -999

The final report of the data series over the location lon, lat (8, 45) verifies that Kerchunking worked as expected.