I am looking at output from an ocean model, and from the output I would like to create a grid (x,y equivalent to lat,lon) of temperatures at the bottom of the water column, i.e., the deepest grid cells. Within the xarray dataset I have the maximum depths (see below, "Depth").
I can do this with a very slow loop, but was wondering if there was a way to avoid a loop, or at least part of the loop.
Here's what the data a code look like so far:
# load data as xarray
data_dir = 'run04'
ds1 = open_mdsdataset(data_dir,iters=np.arange(0,10001,5000),prefix=['U','V','W','S','T','Eta'])
ds1 = ds1.rename({'T':'Tt'}) # T doesn't work because it thinks its transpose
What ds1 looks:
<xarray.Dataset>
Dimensions: (XC: 40, XG: 40, YC: 30, YG: 30, Z: 100, Zl: 100, Zp1: 101, Zu: 100, time: 3)
Coordinates:
* XC (XC) >f4 2500.0 7500.0 12500.0 17500.0 22500.0 27500.0 32500.0 ...
* YC (YC) >f4 2500.0 7500.0 12500.0 17500.0 22500.0 27500.0 32500.0 ...
* XG (XG) >f4 0.0 5000.0 10000.0 15000.0 20000.0 25000.0 30000.0 ...
* YG (YG) >f4 0.0 5000.0 10000.0 15000.0 20000.0 25000.0 30000.0 ...
* Z (Z) >f4 -7.0 -21.0 -35.0 -49.0 -63.0 -77.0 -91.0 -105.0 -119.0 ...
* Zp1 (Zp1) >f4 0.0 -14.0 -28.0 -42.0 -56.0 -70.0 -84.0 -98.0 -112.0 ...
* Zu (Zu) >f4 -14.0 -28.0 -42.0 -56.0 -70.0 -84.0 -98.0 -112.0 ...
* Zl (Zl) >f4 0.0 -14.0 -28.0 -42.0 -56.0 -70.0 -84.0 -98.0 -112.0 ...
rA (YC, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
dxG (YG, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
dyG (YC, XG) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
Depth (YC, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
rAz (YG, XG) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
dxC (YC, XG) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
dyC (YG, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
rAw (YC, XG) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
rAs (YG, XC) >f4 dask.array<shape=(30, 40), chunksize=(30, 40)>
drC (Zp1) >f4 dask.array<shape=(101,), chunksize=(101,)>
drF (Z) >f4 dask.array<shape=(100,), chunksize=(100,)>
PHrefC (Z) >f4 dask.array<shape=(100,), chunksize=(100,)>
PHrefF (Zp1) >f4 dask.array<shape=(101,), chunksize=(101,)>
hFacC (Z, YC, XC) >f4 dask.array<shape=(100, 30, 40), chunksize=(100, 30, 40)>
hFacW (Z, YC, XG) >f4 dask.array<shape=(100, 30, 40), chunksize=(100, 30, 40)>
hFacS (Z, YG, XC) >f4 dask.array<shape=(100, 30, 40), chunksize=(100, 30, 40)>
iter (time) int64 dask.array<shape=(3,), chunksize=(1,)>
* time (time) int64 0 5000 10000
Data variables:
Eta (time, YC, XC) float32 dask.array<shape=(3, 30, 40), chunksize=(1, 30, 40)>
V (time, Z, YG, XC) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>
W (time, Zl, YC, XC) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>
S (time, Z, YC, XC) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>
U (time, Z, YC, XG) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>
Tt (time, Z, YC, XC) float32 dask.array<shape=(3, 100, 30, 40), chunksize=(1, 100, 30, 40)>
And the loop to get temperature values at the deepest cells:
# find the deepest wet cell at each gridpoint
# loop through timesteps
t_at_bottom1 = np.zeros((ds1.time.size,ds1.YC.size,ds1.XC.size))
for ti in np.arange(0,ds1.time.size,1):
# loop through x,y indices
for yi in np.arange(0,ds1.YC.size,1):
for xi in np.arange(0,ds1.XC.size,1):
# look for the grid cell closest to the bottom
t_at_bottom1[ti,yi,xi] = ds1.Tt.sel(time=ds1.time[ti],Z=-ds1.Depth.values[yi,xi],YC=ds1.YC[yi],XC=ds1.XC[xi],method='nearest')
Thanks for your help.
Depthis not time dependent, so you should at least be able to get rid of this loop. Also if this is the actual size of the dataset using dask may be overkill. Did you tryds.load()?