1

I'm trying to understand vectorized indexing in xarray by following this example from the docs:

import xarray as xr
import numpy as np
da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'],
                  coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})

ind_x = xr.DataArray([0, 1], dims=['x'])
ind_y = xr.DataArray([0, 1], dims=['y'])

The output of the array da is as follows:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

So far so good. Now in the example there are shown two ways of indexing. Orthogonal (not interested in this case) and vectorized (what I want). For the vectorized indexing the following is shown:

In [37]: da[ind_x, ind_x]  # vectorized indexing
Out[37]: 
<xarray.DataArray (x: 2)>
array([0, 5])
Coordinates:
    y        (x) <U1 'a' 'b'
  * x        (x) int64 0 1

The result seems to be what I want, but this feels very strange to me. ind_x (which in theory refers to dims=['x']) is being passed twice but somehow is capable of indexing what appears to be both in the x and y dims. As far as I understand the x dim would be the rows and y dim would be the columns, is that correct? How come the same ind_x is capable of accessing both the rows and the cols?

This seems to be the concept I need for my problem, but can't understand how it works or how to extend it to more dimensions. I was expecting this result to be given by da[ind_x, ind_y] however that seems to yield the orthogonal indexing surprisingly enough.

1 Answer 1

2

Having the example with ind_x being used twice is probably a little confusing: actually, the dimension of the indexer doesn't have to matter at all for the indexing behavior! Observe:

ind_a = xr.DataArray([0, 1], dims=["a"]
da[ind_a, ind_a]

Gives:

<xarray.DataArray (a: 2)>
array([0, 5])
Coordinates:
    x        (a) int32 0 1
    y        (a) <U1 'a' 'b'
Dimensions without coordinates: a

The same goes for the orthogonal example:

ind_a = xr.DataArray([0, 1], dims=["a"])
ind_b = xr.DataArray([0, 1], dims=["b"])
da[ind_a, ind_b]

Result:

<xarray.DataArray (a: 2, b: 2)>
array([[0, 2],
       [4, 6]])
Coordinates:
    x        (a) int32 0 1
    y        (b) <U1 'a' 'c'
Dimensions without coordinates: a, b

The difference is purely in terms of "labeling", as in this case you end up with dimensions without coordinates.

Fancy indexing

Generally stated, I personally do not find "fancy indexing" the most intuitive concept. I did find this example in NEP 21 pretty clarifying: https://numpy.org/neps/nep-0021-advanced-indexing.html

Specifically, this:

Consider indexing a 2D array by two 1D integer arrays, e.g., x[[0, 1], [0, 1]]:

  • Outer indexing is equivalent to combining multiple integer indices with itertools.product(). The result in this case is another 2D array with all combinations of indexed elements, e.g., np.array([[x[0, 0], x[0, 1]], [x[1, 0], x[1, 1]]])

  • Vectorized indexing is equivalent to combining multiple integer indices with zip(). The result in this case is a 1D array containing the diagonal elements, e.g., np.array([x[0, 0], x[1, 1]]).

Back to xarray

da[ind_x, ind_y]

Can also be written as:

da.isel(x=ind_x, y=ind_y)

The dimensions are implicit in the order. However, xarray still attempts to broadcast (based on dimension labels), so da[ind_y] mismatches and results in an error. da[ind_a] and da[ind_b] both work.

More dimensions

The dims you provide for the indexer are what determines the shape of the output, not the dimensions of the array you're indexing.

If you want to select single values along the dimensions (so we're zip()-ing through the indexes simultaneously), just make sure that your indexers share the dimension, here for a 3D array:

da = xr.DataArray(
    data=np.arange(3 * 4 * 5).reshape(3, 4, 5),
    coords={
        "x": [1, 2, 3],
        "y": ["a", "b", "c", "d"],
        "z": [1.0, 2.0, 3.0, 4.0, 5.0],
        },
    dims=["x", "y", "z"],
)

ind_along_x = xr.DataArray([0, 1], dims=["new_index"])
ind_along_y = xr.DataArray([0, 2], dims=["new_index"])
ind_along_z = xr.DataArray([0, 3], dims=["new_index"])

da[ind_along_x, ind_along_y, ind_along_z]

Note that the values of the indexers do not have to the same -- that would be a pretty severe limitation, after all.

Result:

<xarray.DataArray (new_index: 2)>
array([ 0, 33])
Coordinates:
    x        (new_index) int32 1 2
    y        (new_index) <U1 'a' 'c'
    z        (new_index) float64 1.0 4.0
Dimensions without coordinates: new_index
Sign up to request clarification or add additional context in comments.

2 Comments

great answer, thank you. For some reason, I couldn't get bracket indexing (da[...]) to work. The isel variant, however, worked just fine. Also, I want to add that this also works the same way with sel (or interp for interpolation) for those needing to index by dimension values, rather than dimension indexes.
Maybe my version of xarray is just kind of old (0.9.6), but I don't get the same output as you from this. It's similar, but the "new_index" doesn't appear in the output dataset. Seems like it isn't doing the vectorised indexing?: <xarray.DataArray (x: 2, y: 2, z: 2)> array([[[ 0, 3], [10, 13]], [[20, 23], [30, 33]]]) Coordinates: * x (x) int64 1 2 * y (y) <U1 'a' 'c' * z (z) float64 1.0 4.0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.