Kerchunking to JSON¶
Example data¶
Let us work with the following example files from the SARAH3 climate data records
SISin202001010000004231000101MA.nc
SISin202001010000004231000101MA_structure.csv
SISin202001020000004231000101MA.nc
Check for consistency¶
In order to create a Kerchunk reference, all datasets need to be identically shaped in terms of chunk sizes! Thus, let us confirm this is the case with our sample data :
Or we can add -v
to report the shapes for each variable :
Reference input files¶
We can proceed to create a JSON Kerchunk reference
for each of the input NetCDF files.
However,
before producing any new file,
let's --dry-run
the command in question
to see what will happen :
Dry run of operations that would be performed:
> Reading files in . matching the pattern *.nc
> Number of files matched: 2
> Creating single reference files to sarah3_sis_kerchunk_references_json
--dry-run
is quite useful -- we need some indication things are right before engaging with real massive processing!
This looks okay, so let's give it a real go
Note that Kerchunking processes run in parallel!
The output of the above command is
sarah3_sis_kerchunk_references_json/
├── SISin202001010000004231000101MA.json
├── SISin202001010000004231000101MA.json.hash
├── SISin202001020000004231000101MA.json
└── SISin202001020000004231000101MA.json.hash
1 directory, 4 files
Aggregate references¶
Next,
we want to cobine the single references into one file.
Let's dry-run the combine
command :
Dry run of operations that would be performed:
> Reading files in sarah3_sis_kerchunk_references_json matching the pattern
*.json
> Number of files matched: 2
> Writing combined reference file to sarah3_sis_kerchunk_reference_json
Warning
In the above example not the subtle name difference :
sarah3_sis_kerchunk_references_json
!= sarah3_sis_kerchunk_reference_json
This also looks fine. So let's create the single reference file
The file sarah3_sis_kerchunk_reference_json
has been created an seems to be a
valid one
sarah3_sis_kerchunk_reference_json: ASCII text, with very long lines (65536), with no line terminators
Test it!¶
Let's try to retrieve data over a geographic location though
✓ Coordinates : 8.0, 45.0.
<xarray.DataArray 'SIS' (time: 96)> Size: 192B
[96 values with dtype=int16]
Coordinates:
lat float32 4B 45.03
lon float32 4B 8.025
* time (time) datetime64[ns] 768B 2020-01-01 ... 2020-01-02T23:30:00
Attributes:
cell_methods: time: point
long_name: Surface Downwelling Shortwave Radiation
missing_value: -999
standard_name: surface_downwelling_shortwave_flux_in_air
units: W m-2
_FillValue: -999
The final report of the data series over the location lon, lat (8, 45)
verifies that Kerchunking worked as expected.