Slicing numpy.ndarray by column index

Question

Is there a way to slice the array below without having to define the row indices i.e. not having to write range(len(X))?

X = np.arange(10*2).reshape((10,2))
L = np.random.randint(0,2,10)

Xs = X[range(len(X)),L]

I thought it was possible to slice with X[:,L] but looks like it's not.

My linspace takes two arguments minimum: start and end. So your code doesn't run. Is your NumPy different? What version? Also, X[:,L] does work for me, provided that I do linspace(5, 20, 10*2) or so. — John Zwinck
– John Zwinck, Commented Oct 3, 2014 at 8:51
No, you can't. Though you should use np.arange() instead of range(). docs.scipy.org/doc/numpy/user/… — Ashwini Chaudhary
– Ashwini Chaudhary, Commented Oct 3, 2014 at 8:55
@JohnZwinck: my mistake it was supposed to be np.arange and not np.linspace. — memecs
– memecs, Commented Oct 3, 2014 at 8:56
L has shape (10,), but you're using it to index the dimension of X that has length 2, not the length 10 one. Is that intentional? — user707650
– user707650, Commented Oct 3, 2014 at 8:59
@Evert: yes, it's intentional. I want to select one element of X[i] based on the value of L[i] — memecs
– memecs, Commented Oct 3, 2014 at 9:00

immerrr · Accepted Answer · 2014-10-22 08:09:57Z

3

You're probably looking for np.choose:

In [25]: X = np.arange(10*2).reshape((10,2)); X
Out[25]: 
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])

In [26]: L = np.random.randint(0,2,10); L
Out[26]: array([1, 1, 1, 1, 1, 0, 0, 0, 0, 1])

In [27]: L.choose(X.T)
Out[27]: array([ 1,  3,  5,  7,  9, 10, 12, 14, 16, 19])

In [28]: # or otherwise

In [29]: np.choose(L, X.T)
Out[29]: array([ 1,  3,  5,  7,  9, 10, 12, 14, 16, 19])

Performance note: while this solution is a direct answer to the question, it's quickly becomes not the most optimal with increase of len(X). As of numpy 1.9.0, np.arange approach is faster:

In [17]: %timeit X[range(len(X)), L]
1000 loops, best of 3: 629 µs per loop

In [18]: %timeit X[np.arange(len(X)), L]
10000 loops, best of 3: 78.8 µs per loop

In [19]: %timeit L.choose(X.T)
10000 loops, best of 3: 146 µs per loop

In [20]: X.shape, L.shape
Out[20]: ((10000, 2), (10000,))

edited Oct 22, 2014 at 8:09

answered Oct 3, 2014 at 14:01

immerrr

1,2737 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

John Zwinck Over a year ago

Yes! This is the one I was trying to work out earlier but didn't get the transpose right so it didn't work. This looks like the best solution so far.

memecs Over a year ago

Does this solution have any advantage over mine with range(...)?

immerrr Over a year ago

@memecs, it doesn't create an intermediate array, BUT it seems to get slower faster than a solution using np.arange when increasing len(L).

Lee · Accepted Answer · 2014-10-03 11:58:11Z

1

You take the diagonal elements of X[:,L] using diag (or diagonal):

np.diag(X[:,L])

Another way to do it is with where:

np.where(L,X[:,1],X[:,0])

edited Oct 3, 2014 at 11:58

answered Oct 3, 2014 at 9:01

Lee

31.4k31 gold badges124 silver badges187 bronze badges

4 Comments

John Zwinck Over a year ago

This seems to work but is a bit unsatisfying because it constructs a temporary which is len(L)^2 in size--and diag() may end up returning a view depending on some details, so that N^2 memory usage may stick around for the life of the result unless you explicitly copy it.

Lee Over a year ago

@JohnZwinck I added an alternative which, I think - please correct me if I'm wrong, doesn't construct a len(L)^2 temporary, or cause similar memory issues.

John Zwinck Over a year ago

Right, the version using where is more efficient for large inputs I think. But it's a bit unsatisfying also, because it doesn't generalize to more than two columns in X. Oh well, it works for the OP's stated case. By the way you can just use L instead of L==1.

Lee Over a year ago

@JohnZwinck. I agree. Also, I included your tip re L. Thanks

ssm · Accepted Answer · 2014-10-03 09:08:01Z

0

Note that

In [9]: X[:, L]
Out[9]:
array([[ 1,  1,  0,  0,  1,  0,  1,  0,  1,  0],
   [ 3,  3,  2,  2,  3,  2,  3,  2,  3,  2],
   [ 5,  5,  4,  4,  5,  4,  5,  4,  5,  4],
   [ 7,  7,  6,  6,  7,  6,  7,  6,  7,  6],
   [ 9,  9,  8,  8,  9,  8,  9,  8,  9,  8],
   [11, 11, 10, 10, 11, 10, 11, 10, 11, 10],
   [13, 13, 12, 12, 13, 12, 13, 12, 13, 12],
   [15, 15, 14, 14, 15, 14, 15, 14, 15, 14],
   [17, 17, 16, 16, 17, 16, 17, 16, 17, 16],
   [19, 19, 18, 18, 19, 18, 19, 18, 19, 18]])

And you want the diagonal elements:

So just do:

In [14]: X[:, L].diagonal()
Out[14]: array([ 1,  3,  4,  6,  9, 10, 13, 14, 17, 18])

answered Oct 3, 2014 at 9:08

ssm

5,4031 gold badge26 silver badges43 bronze badges

1 Comment

memecs Over a year ago

As commented already in another answer, creating an N^2 matrix is rather inefficient.

Collectives™ on Stack Overflow

Slicing numpy.ndarray by column index

3 Answers 3

3 Comments

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related