Multidimensional Xarray indexing without loops in Python

Question

I am looking at output from an ocean model, and from the output I would like to create a grid (x,y equivalent to lat,lon) of temperatures at the bottom of the water column, i.e., the deepest grid cells. Within the xarray dataset I have the maximum depths (see below, "Depth").

I can do this with a very slow loop, but was wondering if there was a way to avoid a loop, or at least part of the loop.

Here's what the data a code look like so far:

# load data as xarray
data_dir = 'run04'
ds1 = open_mdsdataset(data_dir,iters=np.arange(0,10001,5000),prefix=['U','V','W','S','T','Eta'])
ds1 = ds1.rename({'T':'Tt'}) # T doesn't work because it thinks its transpose

What ds1 looks:

<xarray.Dataset>
Dimensions:  (XC: 40, XG: 40, YC: 30, YG: 30, Z: 100, Zl: 100, Zp1: 101, Zu: 100, time: 3)
Coordinates:
  * XC       (XC) >f4 2500.0 7500.0 12500.0 17500.0 22500.0 27500.0 32500.0 ...
  * YC       (YC) >f4 2500.0 7500.0 12500.0 17500.0 22500.0 27500.0 32500.0 ...
  * XG       (XG) >f4 0.0 5000.0 10000.0 15000.0 20000.0 25000.0 30000.0 ...
  * YG       (YG) >f4 0.0 5000.0 10000.0 15000.0 20000.0 25000.0 30000.0 ...
  * Z        (Z) >f4 -7.0 -21.0 -35.0 -49.0 -63.0 -77.0 -91.0 -105.0 -119.0 ...
  * Zp1      (Zp1) >f4 0.0 -14.0 -28.0 -42.0 -56.0 -70.0 -84.0 -98.0 -112.0 ...
  * Zu       (Zu) >f4 -14.0 -28.0 -42.0 -56.0 -70.0 -84.0 -98.0 -112.0 ...
  * Zl       (Zl) >f4 0.0 -14.0 -28.0 -42.0 -56.0 -70.0 -84.0 -98.0 -112.0 ...
    rA       (YC, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
    dxG      (YG, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
    dyG      (YC, XG) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
    Depth    (YC, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
    rAz      (YG, XG) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
    dxC      (YC, XG) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
    dyC      (YG, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
    rAw      (YC, XG) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
    rAs      (YG, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
    drC      (Zp1) >f4 dask.array<shape=(101,), chunksize=(101,)>
    drF      (Z) >f4 dask.array<shape=(100,), chunksize=(100,)>
    PHrefC   (Z) >f4 dask.array<shape=(100,), chunksize=(100,)>
    PHrefF   (Zp1) >f4 dask.array<shape=(101,), chunksize=(101,)>
    hFacC    (Z, YC, XC) >f4 dask.array<shape=(100, 30, 40), chunksize=(100, 30, 40)>
    hFacW    (Z, YC, XG) >f4 dask.array<shape=(100, 30, 40), chunksize=(100, 30, 40)>
    hFacS    (Z, YG, XC) >f4 dask.array<shape=(100, 30, 40), chunksize=(100, 30, 40)>
    iter     (time) int64 dask.array<shape=(3,), chunksize=(1,)>
  * time     (time) int64 0 5000 10000
Data variables:
    Eta      (time, YC, XC) float32 dask.array<shape=(3, 30, 40), chunksize=(1, 30, 40)>
    V        (time, Z, YG, XC) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>
    W        (time, Zl, YC, XC) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>
    S        (time, Z, YC, XC) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>
    U        (time, Z, YC, XG) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>
    Tt       (time, Z, YC, XC) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>

And the loop to get temperature values at the deepest cells:

# find the deepest wet cell at each gridpoint
# loop through timesteps 
t_at_bottom1 = np.zeros((ds1.time.size,ds1.YC.size,ds1.XC.size))
for ti in np.arange(0,ds1.time.size,1):
    # loop through x,y indices
    for yi in np.arange(0,ds1.YC.size,1):        
        for xi in np.arange(0,ds1.XC.size,1):
            # look for the grid cell closest to the bottom
            t_at_bottom1[ti,yi,xi] = ds1.Tt.sel(time=ds1.time[ti],Z=-ds1.Depth.values[yi,xi],YC=ds1.YC[yi],XC=ds1.XC[xi],method='nearest')

Thanks for your help.

Depth is not time dependent, so you should at least be able to get rid of this loop. Also if this is the actual size of the dataset using dask may be overkill. Did you try ds.load()? — mathause
– mathause, Commented Apr 3, 2018 at 16:42
That's presumably true @mathause . If I just loop through x and y using the first timestep, I could get the indices in the z-coord for the bottom cell of each column of cells. As you say, this won't change for the subsequent timesteps. However, I don't know how to use this output of z-coords as indices in the Temperature (Tt) array of subsequent timesteps. This is a coarse model run, so subsequent runs will have much larger datasets. Thanks for the tip though. — Ryan
– Ryan, Commented Apr 5, 2018 at 12:06

mathause · Accepted Answer · 2018-04-10 13:41:40Z

3

Have a look at vectorized indexing.

Simple example in 2D:

import xarray as xr
import numpy as np
data = np.arange(12).reshape(3, 4)
da = xr.DataArray(data, dims=['depth', 'x'],
                  coords=dict(depth=[0, 1, 2], x=[0, 1, 2, 3]))

da looks like

<xarray.DataArray (depth: 3, x: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
Coordinates:
  * depth    (depth) int64 0 1 2
  * x        (x) int64 0 1 2 3

Now x-depended depths can be chosen by index with another dataarray:

sel = xr.DataArray([0, 1, 0, 2], dims=['x'])
da[sel]

This returns

<xarray.DataArray (x: 4)>
array([ 0,  5,  2, 11])
Coordinates:
    depth    (x) int64 0 1 0 2
  * x        (x) int64 0 1 2 3

This requires xarray version 0.10.0 or later.

answered Apr 10, 2018 at 13:41

mathause

1,7011 gold badge16 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Multidimensional Xarray indexing without loops in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related