1

Is it possible to have an xarray with multiple columns all having the same coordinates? In The following example I create an xarray and then I want to extract time series data at different locations. However, to do this I have to create a numpy array to store this data and its coordinates.

#Sample from the data in the netCDF file 
ds['temp'] = xr.DataArray(data=np.random.rand(2,3,4), dims=['time','lat','lon'], 
             coords=dict(time=pd.date_range('1900-1-1',periods=2,freq='D'), 
                         lat=[25.,26.,27.],lon=[-85.,-84.,-83.,-82.]))
display(ds)

#lat and lon locations to extract temp values
locations=np.array([[25.6, -84.7], [26, -83], [26.5, -84.1]])

#Extract time series at different locations
temp=np.empty([ds.shape[0], len(locations)])
lat_lon=np.empty([len(locations),2])

for n in range(locations.shape[0]):
    lat_lon[n,0]=ds.sel(lat=locations[n,0], 
                 lon=locations[n,1], method='nearest').coords['lat'].values
    lat_lon[n,1]=ds.sel(lat=locations[n,0], 
                 lon=locations[n,1], method='nearest').coords['lon'].values
    temp[:,n]=ds.sel(lat=locations[n,0], 
                lon=locations[n,1], method='nearest')

print(temp)
print(lat_lon)

#Find maximum temp for all locations:
temp=temp.max(1)

The output of this code is:

array([[[0.67465371, 0.0710136 , 0.03263631, 0.41050204],
        [0.26447469, 0.46503577, 0.5739435 , 0.33725726],
        [0.20353832, 0.01441925, 0.26728572, 0.70531547]],

       [[0.75418953, 0.20321738, 0.41129902, 0.96464691],
        [0.53046103, 0.88559914, 0.20876142, 0.98030988],
        [0.48009467, 0.7906767 , 0.09548439, 0.61088112]]])
Coordinates:
time (time) datetime64[ns] 1900-01-01 1900-01-02
lat (lat) float64 25.0 26.0 27.0
lon (lon) float64 -85.0 -84.0 -83.0 -82.0
temp (time, lat, lon) float64 0.09061 0.6634 ... 0.5696 0.4438
Attributes: (0)


[[0.26447469 0.5739435  0.01441925]
 [0.53046103 0.20876142 0.7906767 ]]
[[ 26. -85.]
 [ 26. -83.]
 [ 27. -84.]]

More simply, is there a way to find the maximum temp across all locations for every timestamp without creating the intermediate temp array?

1 Answer 1

3

When you create the sample data, you specify 3 values of latitude and 4 values of longitude. That means 12 values in total, on a 2D grid (3D if we add time).

When you want to query values for 3 specific points, you have to query each point individually. As far as I know, there are two ways to do that:

  • Write a loop and store the result on an intermediate array (your solution)
  • Stack dimensions and query longitude and latitude simultaneously.

First, you have to express your locations as a list/array of tuples:

locations=np.array([[25.6, -84.7], [26, -83], [26.5, -84.1]])
coords=[(coord[0], coord[1]) for coord in locations]
print(coords)

[(25.6, -84.7), (26.0, -83.0), (26.5, -84.1)]

Then you interpolate your data for the specified locations, stack latitude and longitude to a new dimension coord, select your points.

(ds
 .interp(lon=locations[:,1], lat=locations[:,0], method='linear') # interpolate on the grid
 .stack(coord=['lat','lon']) # from 3x3 grid to list of 9 points
 .sel(coord=coords)) # select your three points
 .temp.max(dim='coord') # get largest temp value from the coord dimension
)

array([0.81316195, 0.56967184]) # your largest values at both timestamps 

The downside is that xarray doesn't support interpolation for unlabeled multi-index, which is why first you need to interpolate (NOT simply find the nearest neighbor) the grid on your set of latitudes and longitudes.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.