Model subsetting

Demonstration of accessing models with model_catalogs and then subsequent model manipulations with extract_model.

[1]:
import xarray as xr
import model_catalogs as mc
import pandas as pd
import extract_model as em
/Users/kthyng/miniconda3/envs/libgoods-docs/lib/python3.9/site-packages/extract_model/extract_model.py:21: UserWarning: xESMF not found. Interpolation will not work.
  warnings.warn("xESMF not found. Interpolation will not work.")  # pragma: no cover

Model output

[2]:
start = pd.Timestamp.today()
end = start + pd.Timedelta('1 day')

1. Curvilinear, multiple horizontal grids

[3]:
model = 'CBOFS'

lon_range, lat_range = [-76, -75], [37, 38]
bbox_cbofs = [lon_range[0], lat_range[0], lon_range[1], lat_range[1]]

main_cat = mc.setup()
source = mc.select_date_range(main_cat[model], timing='nowcast', start_date=start, end_date=end)
ds_cbofs = source.to_dask()

2. Rectilinear grid

[4]:
model = 'CBOFS-REGULARGRID'

lon_range, lat_range = [-76, -75], [37, 38]
bbox_cbofs_rg = [lon_range[0], lat_range[0], lon_range[1], lat_range[1]]

source = mc.select_date_range(main_cat[model], timing='forecast', start_date=start, end_date=end)
ds_cbofs_rg = source.to_dask()

3. Curvilinear, single horizontal grid

[5]:
model = 'LOOFS'

bbox_loofs = [-78.8, 43.5, -77.5, 43.8]

source = mc.select_date_range(main_cat[model], timing='nowcast', start_date=start, end_date=end)
ds_loofs = source.to_dask()

4. Unstructured grid

[6]:
model = 'NGOFS2'

bbox_ngofs2 = [-98+360, 27, -97+360, 28]

source = mc.select_date_range(main_cat[model], timing='forecast', start_date=start, end_date=end)
ds_ngofs2 = source.to_dask()

Subset Spatially

1. Curvilinear, multiple horizontal grids

Subsetting in more complicated for this example case because it is a ROMS model that is both curvilinear and has multiple horizontal grids (such that different variables in the Dataset may have different associated horizontal grids).

Subset DataArray to box

Narrow only a single variable to the bounding box.

[7]:
ds_cbofs['zeta'].em.sub_bbox(bbox=bbox_cbofs).cf.isel(T=0).cf.plot(x='longitude', y='latitude')
[7]:
<matplotlib.collections.QuadMesh at 0x1676b9760>
_images/subsetting_15_1.png

Subset Dataset to box

Narrow all variables in the Dataset to the bounding box.

[8]:
ds_cbofs_ss = ds_cbofs.em.sub_bbox(bbox=bbox_cbofs)
ds_cbofs_ss
[8]:
<xarray.Dataset>
Dimensions:         (tracer: 3, boundary: 4, s_rho: 20, s_w: 21, eta_rho: 124,
                     xi_rho: 151, eta_u: 124, xi_u: 150, eta_v: 123, xi_v: 149,
                     eta_psi: 123, xi_psi: 149, ocean_time: 24)
Coordinates: (12/19)
  * s_rho           (s_rho) float64 -0.975 -0.925 -0.875 ... -0.075 -0.025
  * s_w             (s_w) float64 -1.0 -0.95 -0.9 -0.85 ... -0.15 -0.1 -0.05 0.0
    lon_rho         (eta_rho, xi_rho) float64 dask.array<chunksize=(124, 151), meta=np.ndarray>
    lat_rho         (eta_rho, xi_rho) float64 dask.array<chunksize=(124, 151), meta=np.ndarray>
  * xi_rho          (xi_rho) int64 181 182 183 184 185 ... 327 328 329 330 331
  * eta_rho         (eta_rho) int64 3 4 5 6 7 8 9 ... 121 122 123 124 125 126
    ...              ...
  * eta_v           (eta_v) int64 3 4 5 6 7 8 9 ... 119 120 121 122 123 124 125
    lon_psi         (eta_psi, xi_psi) float64 dask.array<chunksize=(123, 149), meta=np.ndarray>
    lat_psi         (eta_psi, xi_psi) float64 dask.array<chunksize=(123, 149), meta=np.ndarray>
  * xi_psi          (xi_psi) int64 182 183 184 185 186 ... 326 327 328 329 330
  * eta_psi         (eta_psi) int64 3 4 5 6 7 8 9 ... 120 121 122 123 124 125
  * ocean_time      (ocean_time) datetime64[ns] 2022-08-25T11:00:00 ... 2022-...
Dimensions without coordinates: tracer, boundary
Data variables: (12/97)
    ntimes          int32 ...
    ndtfast         int32 ...
    dt              float64 ...
    dtfast          float64 ...
    dstart          datetime64[ns] ...
    nHIS            int32 ...
    ...              ...
    temp            (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 20, 124, 151), meta=np.ndarray>
    salt            (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 20, 124, 151), meta=np.ndarray>
    oxygen          (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 20, 124, 151), meta=np.ndarray>
    Pair            (ocean_time, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 124, 151), meta=np.ndarray>
    Uwind           (ocean_time, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 124, 151), meta=np.ndarray>
    Vwind           (ocean_time, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 124, 151), meta=np.ndarray>
Attributes: (12/37)
    file:                            nos.cbofs.fields.nowcast.20220825.t12z_0...
    format:                          netCDF-4/HDF5 file
    Conventions:                     CF-1.4, SGRID-0.3
    type:                            ROMS/TOMS history file
    title:                           cbofs nowcast RUN in operational mode
    var_info:                        varinfo.dat
    ...                              ...
    history:                         ROMS/TOMS, Version 3.9, Thursday - Augus...
    ana_file:                        ROMS/Functionals/ana_btflux.h, ROMS/Func...
    CPP_options:                     mode, ADD_FSOBC, ADD_M2OBC, ANA_BPFLUX, ...
    bio_file:                        ROMS/Nonlinear/Biology/hypoxia_srm.h
    DODS_EXTRA.Unlimited_Dimension:  ocean_time
    EXTRA_DIMENSION.N:               20
[9]:
ds_cbofs_ss['zeta'].cf.isel(T=0).cf.plot(x='longitude', y='latitude')
[9]:
<matplotlib.collections.QuadMesh at 0x1666313a0>
_images/subsetting_18_1.png

Subset Dataset to proper set of grids

Return a Dataset with a full suite of horizontal grids of the proper relative sizes but narrowed to the bounding box. Because this model axes don’t align with longitude and latitude, the shape of the result is different.

This function won’t run with a DataArray.

[10]:
ds_cbofs_ss = ds_cbofs.em.sub_grid(bbox=bbox_cbofs)
ds_cbofs_ss['zeta'].cf.isel(T=0).cf.plot(x='longitude', y='latitude')
[10]:
<matplotlib.collections.QuadMesh at 0x1665fc550>
_images/subsetting_20_1.png
[11]:
ds_cbofs_ss
[11]:
<xarray.Dataset>
Dimensions:         (tracer: 3, boundary: 4, s_rho: 20, s_w: 21, eta_rho: 124,
                     xi_rho: 151, eta_u: 124, xi_u: 150, eta_v: 123, xi_v: 151,
                     eta_psi: 123, xi_psi: 150, ocean_time: 24)
Coordinates: (12/19)
  * s_rho           (s_rho) float64 -0.975 -0.925 -0.875 ... -0.075 -0.025
  * s_w             (s_w) float64 -1.0 -0.95 -0.9 -0.85 ... -0.15 -0.1 -0.05 0.0
    lon_rho         (eta_rho, xi_rho) float64 dask.array<chunksize=(124, 151), meta=np.ndarray>
    lat_rho         (eta_rho, xi_rho) float64 dask.array<chunksize=(124, 151), meta=np.ndarray>
    lon_u           (eta_u, xi_u) float64 dask.array<chunksize=(124, 150), meta=np.ndarray>
    lat_u           (eta_u, xi_u) float64 dask.array<chunksize=(124, 150), meta=np.ndarray>
    ...              ...
  * xi_v            (xi_v) int64 181 182 183 184 185 186 ... 327 328 329 330 331
  * xi_psi          (xi_psi) int64 181 182 183 184 185 ... 326 327 328 329 330
  * eta_rho         (eta_rho) int64 3 4 5 6 7 8 9 ... 121 122 123 124 125 126
  * eta_u           (eta_u) int64 3 4 5 6 7 8 9 ... 120 121 122 123 124 125 126
  * eta_v           (eta_v) int64 3 4 5 6 7 8 9 ... 119 120 121 122 123 124 125
  * eta_psi         (eta_psi) int64 3 4 5 6 7 8 9 ... 120 121 122 123 124 125
Dimensions without coordinates: tracer, boundary
Data variables: (12/97)
    ntimes          int32 ...
    ndtfast         int32 ...
    dt              float64 ...
    dtfast          float64 ...
    dstart          datetime64[ns] ...
    nHIS            int32 ...
    ...              ...
    temp            (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 20, 124, 151), meta=np.ndarray>
    salt            (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 20, 124, 151), meta=np.ndarray>
    oxygen          (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 20, 124, 151), meta=np.ndarray>
    Pair            (ocean_time, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 124, 151), meta=np.ndarray>
    Uwind           (ocean_time, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 124, 151), meta=np.ndarray>
    Vwind           (ocean_time, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 124, 151), meta=np.ndarray>
Attributes: (12/37)
    file:                            nos.cbofs.fields.nowcast.20220825.t12z_0...
    format:                          netCDF-4/HDF5 file
    Conventions:                     CF-1.4, SGRID-0.3
    type:                            ROMS/TOMS history file
    title:                           cbofs nowcast RUN in operational mode
    var_info:                        varinfo.dat
    ...                              ...
    history:                         ROMS/TOMS, Version 3.9, Thursday - Augus...
    ana_file:                        ROMS/Functionals/ana_btflux.h, ROMS/Func...
    CPP_options:                     mode, ADD_FSOBC, ADD_M2OBC, ANA_BPFLUX, ...
    bio_file:                        ROMS/Nonlinear/Biology/hypoxia_srm.h
    DODS_EXTRA.Unlimited_Dimension:  ocean_time
    EXTRA_DIMENSION.N:               20

2. Rectilinear grid

Subset DataArray or Dataset to box (same as grid)

All of these have the same essential function when the grid is rectilinear. The only difference is whether operating on the full Dataset or a DataArray.

[12]:
ds_cbofs_rg_ss = ds_cbofs_rg.em.sub_bbox(bbox=bbox_cbofs_rg)
[13]:
ds_cbofs_rg_ss['zeta'].cf.isel(T=0).cf.plot(x='longitude', y='latitude')
[13]:
<matplotlib.collections.QuadMesh at 0x1662cfeb0>
_images/subsetting_25_1.png

3. Curvilinear, single horizontal grid

Subset DataArray or Dataset to box (same as grid)

All of these also have the same essential function when there is only one horizontal grid. The only difference is whether operating on the full Dataset or a DataArray.

[14]:
ds_loofs_ss = ds_loofs.em.sub_grid(bbox=bbox_loofs)
ds_loofs_ss
[14]:
<xarray.Dataset>
Dimensions:    (time: 24, ny: 7, nx: 21, sigma: 20)
Coordinates:
  * time       (time) datetime64[ns] 2022-08-25T11:00:14.062499968 ... 2022-0...
    lon        (ny, nx) float32 dask.array<chunksize=(7, 21), meta=np.ndarray>
    lat        (ny, nx) float32 dask.array<chunksize=(7, 21), meta=np.ndarray>
  * sigma      (sigma) float32 0.0 0.0227 0.0454 0.0681 ... 0.8853 0.9534 1.0
  * nx         (nx) int64 17 18 19 20 21 22 23 24 25 ... 30 31 32 33 34 35 36 37
  * ny         (ny) int64 8 9 10 11 12 13 14
Data variables:
    validtime  (time, ny, nx) object dask.array<chunksize=(1, 7, 21), meta=np.ndarray>
    mask       (ny, nx) float32 dask.array<chunksize=(7, 21), meta=np.ndarray>
    depth      (ny, nx) float32 dask.array<chunksize=(7, 21), meta=np.ndarray>
    zeta       (time, ny, nx) float32 dask.array<chunksize=(1, 7, 21), meta=np.ndarray>
    air_u      (time, ny, nx) float32 dask.array<chunksize=(1, 7, 21), meta=np.ndarray>
    air_v      (time, ny, nx) float32 dask.array<chunksize=(1, 7, 21), meta=np.ndarray>
    u          (time, sigma, ny, nx) float32 dask.array<chunksize=(1, 20, 7, 21), meta=np.ndarray>
    v          (time, sigma, ny, nx) float32 dask.array<chunksize=(1, 20, 7, 21), meta=np.ndarray>
    temp       (time, sigma, ny, nx) float32 dask.array<chunksize=(1, 20, 7, 21), meta=np.ndarray>
Attributes: (12/17)
    datum1:                          reference to LWD=zeta
    datum2:                          reference to IGLD85=zeta - IGLD85
    file_type:                       Full_Grid
    Conventions:                     COARDS
    grid_type:                       rectilinear
    z_type:                          2-D
    ...                              ...
    history:                         Operation-CO-OPS
    references:                      greg.mott@noaa.gov
    creation_date:                   2022-08-25 12:50:16  00:00
    DODS.strlen:                     19
    DODS.dimName:                    validtime_len
    DODS_EXTRA.Unlimited_Dimension:  time
[15]:
ds_loofs_ss['zeta'].cf.isel(T=10).cf.plot(x='longitude', y='latitude')
[15]:
<matplotlib.collections.QuadMesh at 0x1660c55e0>
_images/subsetting_29_1.png

4. Unstructured grid

Subset DataArray or Dataset to box (same as grid)

All of these also have the same essential function when there is only one horizontal grid. The only difference is whether operating on the full Dataset or a DataArray.

[16]:
ds_ngofs2_ss = ds_ngofs2.em.sub_grid(bbox=bbox_ngofs2)
ds_ngofs2_ss
[16]:
<xarray.Dataset>
Dimensions:             (nele: 30734, node: 16604, time: 8, four: 4, three: 3,
                         siglay: 40, siglev: 41, maxnode: 10, maxelem: 8)
Coordinates:
    lonc                (nele) float32 dask.array<chunksize=(30734,), meta=np.ndarray>
    latc                (nele) float32 dask.array<chunksize=(30734,), meta=np.ndarray>
    lon                 (node) float32 dask.array<chunksize=(16604,), meta=np.ndarray>
    lat                 (node) float32 dask.array<chunksize=(16604,), meta=np.ndarray>
  * time                (time) datetime64[ns] 2022-08-25T12:00:00 ... 2022-08...
    sigma_levels        (siglev, node) float32 dask.array<chunksize=(41, 16604), meta=np.ndarray>
    sigma_layers        (siglay, node) float32 dask.array<chunksize=(40, 16604), meta=np.ndarray>
Dimensions without coordinates: nele, node, four, three, siglay, siglev,
                                maxnode, maxelem
Data variables: (12/38)
    nprocs              int32 756
    partition           (nele) int32 dask.array<chunksize=(30734,), meta=np.ndarray>
    x                   (node) float32 dask.array<chunksize=(16604,), meta=np.ndarray>
    y                   (node) float32 dask.array<chunksize=(16604,), meta=np.ndarray>
    xc                  (nele) float32 dask.array<chunksize=(30734,), meta=np.ndarray>
    yc                  (nele) float32 dask.array<chunksize=(30734,), meta=np.ndarray>
    ...                  ...
    nv                  (three, nele) int64 7 7 8 3 ... 16603 16603 16598 16599
    nbe                 (three, nele) int64 0 0 2 3 15 ... 30730 30733 30734 0
    nbsn                (maxnode, node) int32 0 0 0 0 0 0 1 2 ... 0 0 0 0 0 0 0
    ntsn                (node) int32 3 4 5 4 3 3 7 7 6 7 ... 6 6 7 7 6 6 5 5 5 6
    nbve                (maxelem, node) int32 0 0 0 0 0 0 1 3 ... 0 0 0 0 0 0 0
    ntve                (node) int32 2 3 4 3 2 2 7 7 6 7 ... 3 3 4 7 6 3 2 2 2 3
Attributes: (12/17)
    title:                           NGOFS2
    institution:                     School for Marine Science and Technology
    source:                          FVCOM_4.3
    history:                         model started at: 25/08/2022   09:05
    references:                      http://fvcom.smast.umassd.edu, http://co...
    Conventions:                     CF-1.0
    ...                              ...
    Surface_Heat_Forcing:            FVCOM variable surface heat forcing file...
    Surface_Wind_Forcing:            FVCOM variable surface Wind forcing:\nFI...
    Surface_PrecipEvap_Forcing:      SURFACE PRECIPITATION FORCING IS OFF
    DODS.strlen:                     26
    DODS.dimName:                    DateStrLen
    DODS_EXTRA.Unlimited_Dimension:  time
[17]:
ds_ngofs2_ss.cf.isel(T=3).plot.scatter(x='lon', y='lat', hue='zeta', s=0.5)
[17]:
<matplotlib.collections.PathCollection at 0x165ff99d0>
_images/subsetting_33_1.png

Subset Temporally

Use cf-xarray to refer to the time dimension as ds.cf['T'].

By time stamps

Use ds.cf.sel(T=slice(start_time, end_time, stride)).

[18]:
ds_cbofs_rg_ss.cf['T'][:5]
[18]:
<xarray.DataArray 'ocean_time' (ocean_time: 5)>
array(['2022-08-25T11:00:00.000000000', '2022-08-25T12:00:00.000000000',
       '2022-08-25T13:00:00.000000000', '2022-08-25T14:00:00.000000000',
       '2022-08-25T15:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * ocean_time  (ocean_time) datetime64[ns] 2022-08-25T11:00:00 ... 2022-08-2...
Attributes:
    long_name:  time since initialization
    field:      time, scalar, series
    axis:       T
[19]:
ds_cbofs_rg_ss.cf.sel(T=slice(start, start+pd.Timedelta('12 hours'), 3))
[19]:
<xarray.Dataset>
Dimensions:      (ny: 199, nx: 199, ocean_time: 4, Depth: 15)
Coordinates:
    Latitude     (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
    Longitude    (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
  * nx           (nx) int64 269 270 271 272 273 274 ... 462 463 464 465 466 467
  * ny           (ny) int64 167 168 169 170 171 172 ... 360 361 362 363 364 365
  * ocean_time   (ocean_time) datetime64[ns] 2022-08-25T11:00:00 ... 2022-08-...
  * Depth        (Depth) float64 0.0 2.0 4.0 6.0 8.0 ... 35.0 40.0 45.0 50.0
Data variables:
    h            (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
    mask         (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
    zeta         (ocean_time, ny, nx) float32 dask.array<chunksize=(1, 199, 199), meta=np.ndarray>
    zetatomllw   (ocean_time, ny, nx) float32 dask.array<chunksize=(1, 199, 199), meta=np.ndarray>
    u_eastward   (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
    v_northward  (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
    temp         (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
    salt         (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
Attributes: (12/36)
    file:                            nos.cbofs.fields.nowcast.20220825.t12z_0...
    format:                          netCDF-4/HDF5 file
    Conventions:                     CF-1.4, SGRID-0.3
    type:                            ROMS/TOMS history file
    title:                           cbofs nowcast RUN in operational mode
    var_info:                        varinfo.dat
    ...                              ...
    tiling:                          008x016
    history:                         ROMS/TOMS, Version 3.9, Thursday - Augus...
    ana_file:                        ROMS/Functionals/ana_btflux.h, ROMS/Func...
    CPP_options:                     mode, ADD_FSOBC, ADD_M2OBC, ANA_BPFLUX, ...
    bio_file:                        ROMS/Nonlinear/Biology/hypoxia_srm.h
    DODS_EXTRA.Unlimited_Dimension:  ocean_time

By time index

Use ds.cf.isel(T=slice(start_index, end_index, stride)).

[20]:
ds_cbofs_rg_ss.cf.isel(T=slice(5, 15, 2))
[20]:
<xarray.Dataset>
Dimensions:      (ny: 199, nx: 199, ocean_time: 5, Depth: 15)
Coordinates:
    Latitude     (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
    Longitude    (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
  * nx           (nx) int64 269 270 271 272 273 274 ... 462 463 464 465 466 467
  * ny           (ny) int64 167 168 169 170 171 172 ... 360 361 362 363 364 365
  * ocean_time   (ocean_time) datetime64[ns] 2022-08-25T16:00:00 ... 2022-08-26
  * Depth        (Depth) float64 0.0 2.0 4.0 6.0 8.0 ... 35.0 40.0 45.0 50.0
Data variables:
    h            (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
    mask         (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
    zeta         (ocean_time, ny, nx) float32 dask.array<chunksize=(1, 199, 199), meta=np.ndarray>
    zetatomllw   (ocean_time, ny, nx) float32 dask.array<chunksize=(1, 199, 199), meta=np.ndarray>
    u_eastward   (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
    v_northward  (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
    temp         (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
    salt         (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
Attributes: (12/36)
    file:                            nos.cbofs.fields.nowcast.20220825.t12z_0...
    format:                          netCDF-4/HDF5 file
    Conventions:                     CF-1.4, SGRID-0.3
    type:                            ROMS/TOMS history file
    title:                           cbofs nowcast RUN in operational mode
    var_info:                        varinfo.dat
    ...                              ...
    tiling:                          008x016
    history:                         ROMS/TOMS, Version 3.9, Thursday - Augus...
    ana_file:                        ROMS/Functionals/ana_btflux.h, ROMS/Func...
    CPP_options:                     mode, ADD_FSOBC, ADD_M2OBC, ANA_BPFLUX, ...
    bio_file:                        ROMS/Nonlinear/Biology/hypoxia_srm.h
    DODS_EXTRA.Unlimited_Dimension:  ocean_time

Subset to Variables

Specify desired variables by standard names (in particular, those defined in the catalog files). The filter() method also retains the variables necessary to decode vertical coordinates in the future.

[21]:
Vars = ['eastward_sea_water_velocity', 'northward_sea_water_velocity']
ds_cbofs_rg_ss.em.filter(Vars)
[21]:
<xarray.Dataset>
Dimensions:      (ocean_time: 24, Depth: 15, ny: 199, nx: 199)
Coordinates:
    Latitude     (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
    Longitude    (ny, nx) float64 dask.array<chunksize=(199, 199), meta=np.ndarray>
  * nx           (nx) int64 269 270 271 272 273 274 ... 462 463 464 465 466 467
  * ny           (ny) int64 167 168 169 170 171 172 ... 360 361 362 363 364 365
  * ocean_time   (ocean_time) datetime64[ns] 2022-08-25T11:00:00 ... 2022-08-...
  * Depth        (Depth) float64 0.0 2.0 4.0 6.0 8.0 ... 35.0 40.0 45.0 50.0
Data variables:
    u_eastward   (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
    v_northward  (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 15, 199, 100), meta=np.ndarray>
Attributes: (12/36)
    file:                            nos.cbofs.fields.nowcast.20220825.t12z_0...
    format:                          netCDF-4/HDF5 file
    Conventions:                     CF-1.4, SGRID-0.3
    type:                            ROMS/TOMS history file
    title:                           cbofs nowcast RUN in operational mode
    var_info:                        varinfo.dat
    ...                              ...
    tiling:                          008x016
    history:                         ROMS/TOMS, Version 3.9, Thursday - Augus...
    ana_file:                        ROMS/Functionals/ana_btflux.h, ROMS/Func...
    CPP_options:                     mode, ADD_FSOBC, ADD_M2OBC, ANA_BPFLUX, ...
    bio_file:                        ROMS/Nonlinear/Biology/hypoxia_srm.h
    DODS_EXTRA.Unlimited_Dimension:  ocean_time

Return Dataset Size

xarray will estimate the size of the dataset with .nbytes — divide by 1e9 to get GB, or 1e6 to get MB instead.

[22]:
ds_cbofs_rg_ss.em.filter(Vars).nbytes/1e6
[22]:
114.687992

Save to file

Save to file with ds.to_netcdf([filename.nc]).

Put commands together

Select a few variables from the ROMS curvilinear grid model, then subset the Dataset in space to a smaller box, narrow the time range, check size, then save to file. Then, read in saved file to demonstrate that the vertical coordinates have been read in and calculated only at that point.

1. Curvilinear, multiple horizontal grids

[23]:
%%time

standard_names = ['eastward_sea_water_velocity', 'northward_sea_water_velocity',
        'sea_water_temperature', 'sea_water_practical_salinity', 'sea_floor_depth']

# Select bounding box for sub-domain
bbox = [-77, 38.5, -76, 40]

# Select time range to keep
today = pd.Timestamp.today()
later_today = today + pd.Timedelta('12 hours')

# Select out variables from Dataset
ds_small = ds_cbofs.em.filter(standard_names).em.sub_grid(bbox=bbox).cf.sel(T=slice(today, later_today))

# Size of subset
print(f'Subdomain is {ds_small.nbytes/1e6} MB')

# drop attributes for now, except need ROMS
ds_small.attrs = {}
ds_small.attrs['model'] = 'ROMS'

# Save to file
ds_small.to_netcdf('sample_roms.nc')
Subdomain is 174.493296 MB
CPU times: user 4.4 s, sys: 3.25 s, total: 7.65 s
Wall time: 1min 10s

Read in saved file with extract_model to get vertical coord decoding back in!

[24]:
ds_small2 = xr.open_mfdataset(['sample_roms.nc'], preprocess=em.preprocess)
print(ds_small2.nbytes/1e6)
ds_small2
346.98456
[24]:
<xarray.Dataset>
Dimensions:     (s_rho: 20, s_w: 21, eta_rho: 132, xi_rho: 332, ocean_time: 12,
                 eta_u: 132, xi_u: 331, eta_v: 131, xi_v: 332, eta_psi: 131,
                 xi_psi: 331)
Coordinates: (12/21)
  * s_rho       (s_rho) float64 -0.975 -0.925 -0.875 ... -0.125 -0.075 -0.025
  * s_w         (s_w) float64 -1.0 -0.95 -0.9 -0.85 ... -0.15 -0.1 -0.05 0.0
    lon_rho     (eta_rho, xi_rho) float64 dask.array<chunksize=(132, 332), meta=np.ndarray>
    lat_rho     (eta_rho, xi_rho) float64 dask.array<chunksize=(132, 332), meta=np.ndarray>
  * ocean_time  (ocean_time) datetime64[ns] 2022-08-25T11:00:00 ... 2022-08-2...
  * xi_rho      (xi_rho) int64 0 1 2 3 4 5 6 7 ... 325 326 327 328 329 330 331
    ...          ...
  * xi_psi      (xi_psi) int64 0 1 2 3 4 5 6 7 ... 324 325 326 327 328 329 330
  * eta_u       (eta_u) int64 0 1 2 3 4 5 6 7 ... 125 126 127 128 129 130 131
  * eta_v       (eta_v) int64 0 1 2 3 4 5 6 7 ... 124 125 126 127 128 129 130
  * eta_psi     (eta_psi) int64 0 1 2 3 4 5 6 7 ... 124 125 126 127 128 129 130
    z_rho       (ocean_time, s_rho, eta_rho, xi_rho) float64 dask.array<chunksize=(12, 20, 132, 332), meta=np.ndarray>
    z_w         (ocean_time, s_w, eta_rho, xi_rho) float64 dask.array<chunksize=(12, 21, 132, 332), meta=np.ndarray>
Data variables: (12/13)
    h           (eta_rho, xi_rho) float64 dask.array<chunksize=(132, 332), meta=np.ndarray>
    zeta        (ocean_time, eta_rho, xi_rho) float32 dask.array<chunksize=(12, 132, 332), meta=np.ndarray>
    hc          float64 ...
    Cs_r        (s_rho) float64 dask.array<chunksize=(20,), meta=np.ndarray>
    Cs_w        (s_w) float64 dask.array<chunksize=(21,), meta=np.ndarray>
    mask_rho    (eta_rho, xi_rho) float64 dask.array<chunksize=(132, 332), meta=np.ndarray>
    ...          ...
    mask_v      (eta_v, xi_v) float64 dask.array<chunksize=(131, 332), meta=np.ndarray>
    mask_psi    (eta_psi, xi_psi) float64 dask.array<chunksize=(131, 331), meta=np.ndarray>
    u           (ocean_time, s_rho, eta_u, xi_u) float32 dask.array<chunksize=(12, 20, 132, 331), meta=np.ndarray>
    v           (ocean_time, s_rho, eta_v, xi_v) float32 dask.array<chunksize=(12, 20, 131, 332), meta=np.ndarray>
    temp        (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(12, 20, 132, 332), meta=np.ndarray>
    salt        (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(12, 20, 132, 332), meta=np.ndarray>
Attributes:
    model:    ROMS
[25]:
ds_small2[['z_rho','z_w']].nbytes/1e6
[25]:
173.196584

2. Rectilinear grid

[26]:
%%time

standard_names = ['eastward_sea_water_velocity', 'northward_sea_water_velocity',
        'sea_water_temperature', 'sea_water_practical_salinity', 'sea_floor_depth']

# Select bounding box for sub-domain
bbox = [-77, 38.5, -76, 40]

# Select time range to keep
today = pd.Timestamp.today()
later_today = today + pd.Timedelta('12 hours')

# Select out variables from Dataset
ds_small_rg = ds_cbofs_rg.em.filter(standard_names).em.sub_grid(bbox=bbox).cf.sel(T=slice(today, later_today))

# Size of subset
print(f'Subdomain is {ds_small_rg.nbytes/1e6} MB')

# Save to file
ds_small_rg.to_netcdf('sample_roms_rg.nc')
Subdomain is 130.608112 MB
CPU times: user 9.7 s, sys: 8.77 s, total: 18.5 s
Wall time: 2min 48s

Read in saved file with extract_model to get vertical coord decoding back in!

[27]:
ds_small_rg2 = xr.open_mfdataset(['sample_roms_rg.nc'], preprocess=em.preprocess)
print(ds_small_rg2.nbytes/1e6)
ds_small_rg2
130.608112
[27]:
<xarray.Dataset>
Dimensions:      (ny: 226, nx: 199, ocean_time: 12, Depth: 15)
Coordinates:
  * Depth        (Depth) float64 0.0 2.0 4.0 6.0 8.0 ... 35.0 40.0 45.0 50.0
    Latitude     (ny, nx) float64 dask.array<chunksize=(226, 199), meta=np.ndarray>
    Longitude    (ny, nx) float64 dask.array<chunksize=(226, 199), meta=np.ndarray>
  * ocean_time   (ocean_time) datetime64[ns] 2022-08-25T11:00:00 ... 2022-08-...
  * nx           (nx) int64 69 70 71 72 73 74 75 ... 261 262 263 264 265 266 267
  * ny           (ny) int64 467 468 469 470 471 472 ... 687 688 689 690 691 692
Data variables:
    h            (ny, nx) float64 dask.array<chunksize=(226, 199), meta=np.ndarray>
    u_eastward   (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(12, 15, 226, 199), meta=np.ndarray>
    v_northward  (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(12, 15, 226, 199), meta=np.ndarray>
    temp         (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(12, 15, 226, 199), meta=np.ndarray>
    salt         (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(12, 15, 226, 199), meta=np.ndarray>
Attributes: (12/36)
    file:                            nos.cbofs.fields.nowcast.20220825.t12z_0...
    format:                          netCDF-4/HDF5 file
    Conventions:                     CF-1.4, SGRID-0.3
    type:                            ROMS/TOMS history file
    title:                           cbofs nowcast RUN in operational mode
    var_info:                        varinfo.dat
    ...                              ...
    tiling:                          008x016
    history:                         ROMS/TOMS, Version 3.9, Thursday - Augus...
    ana_file:                        ROMS/Functionals/ana_btflux.h, ROMS/Func...
    CPP_options:                     mode, ADD_FSOBC, ADD_M2OBC, ANA_BPFLUX, ...
    bio_file:                        ROMS/Nonlinear/Biology/hypoxia_srm.h
    DODS_EXTRA.Unlimited_Dimension:  ocean_time

3. Curvilinear, single horizontal grid

[28]:
%%time

standard_names = ['eastward_sea_water_velocity', 'northward_sea_water_velocity',
        'sea_water_temperature', 'sea_water_practical_salinity', 'sea_floor_depth']

# Select bounding box for sub-domain
bbox = [-78.8, 43.5, -77.5, 43.8]

# Select time range to keep
today = pd.Timestamp.today()
later_today = today + pd.Timedelta('12 hours')

# Select out variables from Dataset
ds_small_loofs = ds_loofs.em.filter(standard_names).em.sub_grid(bbox=bbox).cf.sel(T=slice(today, later_today))

# Size of subset
print(f'Subdomain is {ds_small_loofs.nbytes/1e6} MB')

# Save to file
ds_small_loofs.to_netcdf('sample_loofs.nc')
Subdomain is 0.43258 MB
CPU times: user 437 ms, sys: 99.2 ms, total: 536 ms
Wall time: 3.78 s

Read in saved file with extract_model to get vertical coord decoding back in!

[29]:
ds_small_loofs2 = xr.open_mfdataset(['sample_loofs.nc'], preprocess=em.preprocess)
print(ds_small_loofs2.nbytes/1e6)
ds_small_loofs2
0.5737
[29]:
<xarray.Dataset>
Dimensions:  (time: 12, ny: 7, nx: 21, sigma: 20)
Coordinates:
  * sigma    (sigma) float32 0.0 0.0227 0.0454 0.0681 ... 0.8853 0.9534 1.0
  * time     (time) datetime64[ns] 2022-08-25T11:00:14.062499968 ... 2022-08-...
    lon      (ny, nx) float32 dask.array<chunksize=(7, 21), meta=np.ndarray>
    lat      (ny, nx) float32 dask.array<chunksize=(7, 21), meta=np.ndarray>
  * nx       (nx) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  * ny       (ny) int64 0 1 2 3 4 5 6
    z        (time, sigma, ny, nx) float32 dask.array<chunksize=(12, 20, 7, 21), meta=np.ndarray>
Data variables:
    zeta     (time, ny, nx) float32 dask.array<chunksize=(12, 7, 21), meta=np.ndarray>
    depth    (ny, nx) float32 dask.array<chunksize=(7, 21), meta=np.ndarray>
    u        (time, sigma, ny, nx) float32 dask.array<chunksize=(12, 20, 7, 21), meta=np.ndarray>
    v        (time, sigma, ny, nx) float32 dask.array<chunksize=(12, 20, 7, 21), meta=np.ndarray>
    temp     (time, sigma, ny, nx) float32 dask.array<chunksize=(12, 20, 7, 21), meta=np.ndarray>
Attributes: (12/17)
    datum1:                          reference to LWD=zeta
    datum2:                          reference to IGLD85=zeta - IGLD85
    file_type:                       Full_Grid
    Conventions:                     COARDS
    grid_type:                       rectilinear
    z_type:                          2-D
    ...                              ...
    history:                         Operation-CO-OPS
    references:                      greg.mott@noaa.gov
    creation_date:                   2022-08-25 12:50:16  00:00
    DODS.strlen:                     19
    DODS.dimName:                    validtime_len
    DODS_EXTRA.Unlimited_Dimension:  time
[30]:
ds_small_loofs2['z'].nbytes/1e6
[30]:
0.14112

4. Unstructured grid

[31]:
%%time

standard_names = ['eastward_sea_water_velocity', 'northward_sea_water_velocity',
        'sea_water_temperature', 'sea_water_practical_salinity', 'sea_floor_depth']

# Select bounding box for sub-domain
bbox = [-98+360, 27, -97+360, 28]

# Select time range to keep
today = pd.Timestamp.today()
later_today = today + pd.Timedelta('12 hours')

# Select out variables from Dataset
ds_small_ngofs2 = ds_ngofs2.em.filter(standard_names).em.sub_grid(bbox=bbox).cf.sel(T=slice(today, later_today))

# Size of subset
print(f'Subdomain is {ds_small_ngofs2.nbytes/1e6} MB')

# Save to file
ds_small_ngofs2.to_netcdf('sample_ngofs2.nc')
Subdomain is 69.865408 MB
CPU times: user 10.5 s, sys: 9.49 s, total: 20 s
Wall time: 3min 7s

Read in saved file with extract_model to get vertical coord decoding back in!

[32]:
ds_small_ngofs22 = xr.open_mfdataset(['sample_ngofs2.nc'], preprocess=em.preprocess)
print(ds_small_ngofs22.nbytes/1e6)
ds_small_ngofs22
69.865408
[32]:
<xarray.Dataset>
Dimensions:       (node: 16604, nele: 30734, siglay: 40, siglev: 41, time: 4,
                   three: 3, maxnode: 10, maxelem: 8)
Coordinates:
    lat           (node) float32 dask.array<chunksize=(16604,), meta=np.ndarray>
    lon           (node) float32 dask.array<chunksize=(16604,), meta=np.ndarray>
    latc          (nele) float32 dask.array<chunksize=(30734,), meta=np.ndarray>
    lonc          (nele) float32 dask.array<chunksize=(30734,), meta=np.ndarray>
    sigma_layers  (siglay, node) float32 dask.array<chunksize=(40, 16604), meta=np.ndarray>
  * time          (time) datetime64[ns] 2022-08-25T12:00:00 ... 2022-08-25T21...
Dimensions without coordinates: node, nele, siglay, siglev, three, maxnode,
                                maxelem
Data variables: (12/17)
    sigma_levels  (siglev, node) float32 dask.array<chunksize=(41, 16604), meta=np.ndarray>
    h             (node) float32 dask.array<chunksize=(16604,), meta=np.ndarray>
    zeta          (time, node) float32 dask.array<chunksize=(4, 16604), meta=np.ndarray>
    x             (node) float32 dask.array<chunksize=(16604,), meta=np.ndarray>
    y             (node) float32 dask.array<chunksize=(16604,), meta=np.ndarray>
    xc            (nele) float32 dask.array<chunksize=(30734,), meta=np.ndarray>
    ...            ...
    nv            (three, nele) int64 dask.array<chunksize=(3, 30734), meta=np.ndarray>
    nbe           (three, nele) int64 dask.array<chunksize=(3, 30734), meta=np.ndarray>
    nbsn          (maxnode, node) int32 dask.array<chunksize=(10, 16604), meta=np.ndarray>
    ntsn          (node) int32 dask.array<chunksize=(16604,), meta=np.ndarray>
    nbve          (maxelem, node) int32 dask.array<chunksize=(8, 16604), meta=np.ndarray>
    ntve          (node) int32 dask.array<chunksize=(16604,), meta=np.ndarray>
Attributes: (12/17)
    title:                           NGOFS2
    institution:                     School for Marine Science and Technology
    source:                          FVCOM_4.3
    history:                         model started at: 25/08/2022   09:05
    references:                      http://fvcom.smast.umassd.edu, http://co...
    Conventions:                     CF-1.0
    ...                              ...
    Surface_Heat_Forcing:            FVCOM variable surface heat forcing file...
    Surface_Wind_Forcing:            FVCOM variable surface Wind forcing:\nFI...
    Surface_PrecipEvap_Forcing:      SURFACE PRECIPITATION FORCING IS OFF
    DODS.strlen:                     26
    DODS.dimName:                    DateStrLen
    DODS_EXTRA.Unlimited_Dimension:  time