Inspect data¶
rekx
can diagnose the structure of data stored in Xarray-supported file formats.
A single file¶
Inspect a single NetCDF file
SISin202001010000004231000101MA.nc
Variable Shape Chunks Cache Elements Preemption Type Scale Offset Compression Level Shuffling Read Time
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
lon_bnds 2600 x 2 2600 x 2 16777216 1000 0.75 float32 - - zlib 4 False -
SIS 48 x 2600 x 2600 1 x 1 x 2600 16777216 1000 0.75 int16 - - zlib 4 False 0.008
lon 2600 2600 16777216 1000 0.75 float32 - - zlib 4 False -
lat 2600 2600 16777216 1000 0.75 float32 - - zlib 4 False -
time 48 512 16777216 1000 0.75 float64 - - zlib 4 False -
lat_bnds 2600 x 2 2600 x 2 16777216 1000 0.75 float32 - - zlib 4 False -
record_status 48 48 16777216 1000 0.75 int8 - - zlib 4 False -
File size: 181550165 bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600
* Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]
Perhaps restrict inspection on data variables only
SISin202001010000004231000101MA.nc
Variable Shape Chunks Cache Elements Preemption Type Scale Offset Compression Level Shuffling Read Time
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
SIS 48 x 2600 x 2600 1 x 1 x 2600 16777216 1000 0.75 int16 - - zlib 4 False 0.008
File size: 181550165 bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600
* Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]
Hint
rekx
can scan selectively for the following --variable-set
s :
[all|coordinates|coordinates-without-data|data|metadata|time]
.
List them via rekx inspect --help
.
or even show humanised size figures
SISin202001010000004231000101MA.nc
Variable Shape Chunks Cache Elements Preemption Type Scale Offset Compression Level Shuffling Read Time
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
SIS 48 x 2600 x 2600 1 x 1 x 2600 16777216 1000 0.75 int16 - - zlib 4 False 0.008
File size: 173.1 MiB bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600
* Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]
A directory with multiple files¶
Let's consider a directory with 2 NetCDF files
SISin202001010000004231000101MA.nc
SISin202001010000004231000101MA_structure.csv
SISin202001020000004231000101MA.nc
and inspect them all, in this case scanning only for data variables in the current directory
Name Size Dimensions Variable Shape Chunks Cache Elements Preemption Type Scale Offset Compression Level Shuffling Read Time
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
SISin202001010000004231000101MA.nc 181550165 2 x 48 x 2600 x 2600 SIS 48 x 2600 x 2600 1 x 1 x 2600 16777216 1000 0.75 int16 - - zlib 4 False 0.009
SISin202001020000004231000101MA.nc 182167423 2 x 48 x 2600 x 2600 SIS 48 x 2600 x 2600 1 x 1 x 2600 16777216 1000 0.75 int16 - - zlib 4 False 0.009
Dimensions: time x lat x bnds x lon | Cache size in bytes | Number of elements | Preemption strategy ranging in [0, 1] | Average time of 10 reads in seconds
Info
The .
means in Linux the current working directory
By default, multiple files are reported on a long table. For whatever the reason might be, we night not want this. We can instead ask for independent tables per input file :
SISin202001010000004231000101MA.nc
Variable Shape Chunks Cache Elements Preemption Type Scale Offset Compression Level Shuffling Read Time
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
SIS 48 x 2600 x 2600 1 x 1 x 2600 16777216 1000 0.75 int16 - - zlib 4 False 0.008
File size: 181550165 bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600
* Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]
SISin202001020000004231000101MA.nc
Variable Shape Chunks Cache Elements Preemption Type Scale Offset Compression Level Shuffling Read Time
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
SIS 48 x 2600 x 2600 1 x 1 x 2600 16777216 1000 0.75 int16 - - zlib 4 False 0.008
File size: 182167423 bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600
* Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]
CSV Output¶
We all need machine readable output.
Here's how to get one for rekx
' inspect
command
Let's verify it worked well
Note
Here's how it render's in this documentation page using mkdocs-table-reader-plugin
File Name | File Size | Variable | Shape | Type | Compression | Read time |
---|---|---|---|---|---|---|
float32 | - | - | zlib | 4 | False | - |
float32 | - | - | zlib | 4 | False | - |
int8 | - | - | zlib | 4 | False | - |
float32 | - | - | zlib | 4 | False | - |
float32 | - | - | zlib | 4 | False | - |
int16 | - | - | zlib | 4 | False | 0.008 |
float64 | - | - | zlib | 4 | False | - |