2

Say I have an array d of size (N,T), out of which I need to select elements using index of shape (N,), where the first element corresponds to the index in the first row, etc... how would I do that?

For example

>>> d
Out[748]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]])
>>> index
Out[752]: array([5, 6, 1], dtype=int64)

Expected Output:

array([[5],
       [6],
       [2])

Which is an array containing the fifth element of the first row, the 6th element of the second row and the second element of the third row.

Update

Since I will have sufficiently larger N, I was interested in the speed of the different methods for higher N. With N = 30000:

>>> %timeit np.diag(e.take(index2, axis=1)).reshape(N*3, 1)
1 loops, best of 3: 3.9 s per loop
>>> %timeit e.ravel()[np.arange(e.shape[0])*e.shape[1]+index2].reshape(N*3, 1)
1000 loops, best of 3: 287 µs per loop

Finally, you suggest reshape(). As I want to leave it as general as possible (without knowing N), I instead use [:,np.newaxis] - it seems to increase duration from 287µs to 288µs, which I'll take :)

3
  • Is the final output an array of three different arrays, like you've printed here? Or just one array with three elements? Commented Jul 11, 2014 at 15:17
  • Finally, I want to add the final output to the initial array. The way I printed it here, I can simply do dNew = append(d, expectedOutput, axis=-1). Other final output that also allows this is equally welcome. Commented Jul 11, 2014 at 15:21
  • If speed is important you should check my second edit then if it also improves on your computer. Commented Jul 11, 2014 at 16:29

2 Answers 2

2

This might be ugly but more efficient:

>>> d.ravel()[np.arange(d.shape[0])*d.shape[1]+index]
array([5, 6, 2])

edit

As pointed out by @deinonychusaur the statement above can be written as clean as:

d[np.arange(index.size),index]
Sign up to request clarification or add additional context in comments.

Comments

2

There might be nicer ways, but a combo of take, diag and reshape would do:

In [137]: np.diag(d.take(index, axis=1)).reshape(3, 1)
Out[137]: 
array([[5],
       [6],
       [2]])

EDIT

Comparisons with @Emanuele Paolinis' alterative, adding reshape to it to match the sought output:

In [142]: %timeit d.reshape(d.size)[np.arange(d.shape[0])*d.shape[1]+index].reshape(3, 1)
100000 loops, best of 3: 9.51 µs per loop

In [143]: %timeit np.diag(d.take(index, axis=1)).reshape(3, 1)
100000 loops, best of 3: 3.81 µs per loop

In [146]: %timeit d.ravel()[np.arange(d.shape[0])*d.shape[1]+index].reshape(3, 1)
100000 loops, best of 3: 8.56 µs per loop

This method is about twice as fast as both proposed alternatives.

EDIT 2: An even better method

Based on @Emanuele Paulinis' version but reduced number of operations outperforms all on large arrays 10k rows by 100 columns.

In [199]: %timeit d[(np.arange(index.size), index)].reshape(index.size, 1)
1000 loops, best of 3: 364 µs per loop

In [200]: %timeit d.ravel()[np.arange(d.shape[0])*d.shape[1]+index].reshape(index.size, 1)
100 loops, best of 3: 5.22 ms per loop

So if speed is of essence:

d[(np.arange(index.size), index)].reshape(index.size, 1)

4 Comments

I said "more efficient" because your solution passes through a N*N matrix of which you discard all but the diagonal. So I assume (but I have not checked) that your solution is quadratic in the length of index while mine should be linear.
@EmanuelePaolini True that given a large enough size yours should be more efficient.
@EmanuelePaolini for 10k rows and 100 columns yours is about 150 times faster so I suppose it's a matter of problem size
@EmanuelePaolini made tweak to yours that I think is more clear and is more than 10-fold faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.