0

I'm working in a Jupyter Notebook using Python 3.12. I have a 2D xarray (in reality, it's 3D, but we can treat it as 2D). I want to pull out the values based on indices I acquire elsewhere, and then store those values in a list (or numpy array; haven't decided yet, but the particular container doesn't matter, just that I have the values stored in a container that I can easily access).

I have code that works, but it is painfully slow. See below.

myValues = []
for idx_pair in myIndices:
    myValues.append( da[ idx_pair[0],idx_pair[1] ].item() )

In the above, myIndices is a two column array, with each row being the x and y index of da. The amount of index pairs in myIndices can reach as high as 100,000, and I need to loop through several different sets of myIndices. da is a 3D xarray ( da.shape ~ (20, 2000, 2000) which correspond to time, x direction, y direction ), though it can be treated as a 2D array that's about 2000x2000 (just x and y). Getting the values for a 2D da takes about 35 seconds. So this code works (it's the only method I've found that does), but it is far too slow to be useful.

**How can I more efficiently and rapidly access values from an xarray using index locations? **

I have tried da.load(). Supposedly, that takes care of the lazy loading issue inherent with a lot of xarrays, but it does nothing to reduce the run time in my case.

I've tried different ways of accessing the values from the xarray (e.g., .isel(x=myXs, y=myYs) or da[ myXs, myYs ]), but I get weird and insanely large matrices (e.g., ~100,000x100,000 with most of the values being zero). I feel like the solution to my problem is in how I'm accessing the values, but I can't figure out any other method to do it.

2
  • 2
    Your loop with .item() is killing performance. You're making 100,000 individual element extractions, each with Python overhead. You want vectorized extraction. You want to try idx_x = myIndices[:, 0] then idx_y = myIndices[:, 1] then the line myValues = da.values[idx_x, idx_y]. About this: see 'Integer array indexing' under 'Advanced indexing' here in the numpy documentation. Also commonly called 'fancy indexing'. Commented Oct 18 at 2:14
  • 1
    @Wayne, that causes the problem I describe in the last paragraph. It creates an insanely large matrix (>50 GBs), and still takes a stupidly long time to run. Commented Oct 20 at 20:50

1 Answer 1

0

Here's the solution.

idx_x = xr.DataArray(myIndices[:,0], dims="points")
idx_y = xr.DataArray(myIndices[:,1], dims="points")

myValues = da.isel(x=idx_x, y=idx_y)
myValues = myValues.values # this converts it from an xarray into just an array

I went from ~35 second run times to ~0.0035 second run times. Four orders of magnitude of improvement! I don't understand why putting the indices into an xarray.DataArray with dimension name "points" (rather than just as a list as I tried before) causes the .isel to work correctly, but it does.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.