Skip to content

Inspect data

rekx can diagnose the structure of data stored in Xarray-supported file formats.

A single file

Inspect a single NetCDF file

rekx inspect data/single_file/SISin202001010000004231000101MA.nc
                                                              SISin202001010000004231000101MA.nc                                                               

  Variable        Shape              Chunks         Cache      Elements   Preemption   Type      Scale   Offset   Compression   Level   Shuffling   Read Time  
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  lon             2600               2600           16777216   4133       0.75         float32   -       -        zlib          4       False       -          
  lat_bnds        2600 x 2           2600 x 2       16777216   4133       0.75         float32   -       -        zlib          4       False       -          
  time            48                 512            16777216   4133       0.75         float64   -       -        zlib          4       False       -          
  lon_bnds        2600 x 2           2600 x 2       16777216   4133       0.75         float32   -       -        zlib          4       False       -          
  lat             2600               2600           16777216   4133       0.75         float32   -       -        zlib          4       False       -          
  record_status   48                 48             16777216   4133       0.75         int8      -       -        zlib          4       False       -          
  SIS             48 x 2600 x 2600   1 x 1 x 2600   16777216   4133       0.75         int16     -       -        zlib          4       False       0.011      

                                        File size: 181550165 bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600                                        
                                           * Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]                                            

Perhaps restrict inspection on data variables only

rekx inspect data/single_file/SISin202001010000004231000101MA.nc --variable-set data
                                                           SISin202001010000004231000101MA.nc                                                           

  Variable   Shape              Chunks         Cache      Elements   Preemption   Type    Scale   Offset   Compression   Level   Shuffling   Read Time  
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  SIS        48 x 2600 x 2600   1 x 1 x 2600   16777216   4133       0.75         int16   -       -        zlib          4       False       0.011      

                                    File size: 181550165 bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600                                     
                                        * Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]                                        

Hint

rekx can scan selectively for the following --variable-sets : [all|coordinates|coordinates-without-data|data|metadata|time]. List them via rekx inspect --help.

or even show humanised size figures

rekx inspect data/single_file/SISin202001010000004231000101MA.nc --variable-set data --humanize
                                                           SISin202001010000004231000101MA.nc                                                           

  Variable   Shape              Chunks         Cache      Elements   Preemption   Type    Scale   Offset   Compression   Level   Shuffling   Read Time  
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  SIS        48 x 2600 x 2600   1 x 1 x 2600   16777216   4133       0.75         int16   -       -        zlib          4       False       0.011      

                                    File size: 173.1 MiB bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600                                     
                                        * Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]                                        

A directory with multiple files

Let's consider a directory with 2 NetCDF files

ls -1 data/multiple_files_unique_shape/
SISin202001010000004231000101MA.nc
SISin202001010000004231000101MA_structure.csv
SISin202001020000004231000101MA.nc

and inspect them all, in this case scanning only for data variables in the current directory

cd data/multiple_files_unique_shape/
rekx inspect . --variable-set data
  Name                                 Size        Dimensions             Variable   Shape              Chunks         Cache      Elements   Preemption   Type    Scale   Offset   Compression   Level   Shuffling   Read Time  
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  SISin202001020000004231000101MA.nc   182167423   2 x 48 x 2600 x 2600   SIS        48 x 2600 x 2600   1 x 1 x 2600   16777216   4133       0.75         int16   -       -        zlib          4       False       0.013      
  SISin202001010000004231000101MA.nc   181550165   2 x 48 x 2600 x 2600   SIS        48 x 2600 x 2600   1 x 1 x 2600   16777216   4133       0.75         int16   -       -        zlib          4       False       0.013      

                                  Dimensions: time x lon x bnds x lat | Cache size in bytes | Number of elements | Preemption strategy ranging in [0, 1] | Average time of 10 reads in seconds                                  

Info

The . means in Linux the current working directory

By default, multiple files are reported on a long table. For whatever the reason might be, we night not want this. We can instead ask for independent tables per input file :

rekx inspect data/multiple_files_unique_shape/ --variable-set data --no-long-table
                                                           SISin202001020000004231000101MA.nc                                                           

  Variable   Shape              Chunks         Cache      Elements   Preemption   Type    Scale   Offset   Compression   Level   Shuffling   Read Time  
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  SIS        48 x 2600 x 2600   1 x 1 x 2600   16777216   4133       0.75         int16   -       -        zlib          4       False       0.014      

                                    File size: 182167423 bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600                                     
                                        * Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]                                        
                                                           SISin202001010000004231000101MA.nc                                                           

  Variable   Shape              Chunks         Cache      Elements   Preemption   Type    Scale   Offset   Compression   Level   Shuffling   Read Time  
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  SIS        48 x 2600 x 2600   1 x 1 x 2600   16777216   4133       0.75         int16   -       -        zlib          4       False       0.014      

                                    File size: 181550165 bytes, Dimensions: time: 48, lon: 2600, bnds: 2, lat: 2600                                     
                                        * Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1]                                        

CSV Output

We all need machine readable output. Here's how to get one for rekx' inspect command

rekx inspect SISin202001010000004231000101MA.nc --csv SISin202001010000004231000101MA_structure.csv
Output written to SISin202001010000004231000101MA_structure.csv

Let's verify it worked well

file SISin202001010000004231000101MA_structure.csv
SISin202001010000004231000101MA_structure.csv: ASCII text, with CRLF line terminators

Note

Here's how it render's in this documentation page using mkdocs-table-reader-plugin

File Name File Size Variable Shape Type Compression Read time
float32 - - zlib 4 False -
float64 - - zlib 4 False -
int16 - - zlib 4 False 0.011
int8 - - zlib 4 False -
float32 - - zlib 4 False -
float32 - - zlib 4 False -
float32 - - zlib 4 False -