Most efficient way to pull specified rows from a 2-d array?

Question

I have a 2-D numpy array with 100,000+ rows. I need to return a subset of those rows (and I need to perform that operations many 1,000s of times, so efficiency is important).

A mock-up example is like this:

import numpy as np
a = np.array([[1,5.5],
             [2,4.5],
             [3,9.0],
             [4,8.01]])
b = np.array([2,4])

So...I want to return the array from a with rows identified in the first column by b:

c=[[2,4.5],
   [4,8.01]]

The difference, of course, is that there are many more rows in both a and b, so I'd like to avoid looping. Also, I played with making a dictionary and using np.nonzero but still am a bit stumped.

Thanks in advance for any ideas!

EDIT: Note that, in this case, b are identifiers rather than indices. Here's a revised example:

import numpy as np
a = np.array([[102,5.5],
             [204,4.5],
             [343,9.0],
             [40,8.01]])
b = np.array([102,343])

And I want to return:

c = [[102,5.5],
     [343,9.0]]

JoshAdel · Accepted Answer · 2011-04-01 19:14:49Z

6

EDIT: Deleted my original answer since it was a misunderstanding of the question. Instead try:

ii = np.where((a[:,0] - b.reshape(-1,1)) == 0)[1]
c = a[ii,:]

What I'm doing is using broadcasting to subtract each element of b from a, and then searching for zeros in that array which indicate a match. This should work, but you should be a little careful with comparison of floats, especially if b is not an array of ints.

EDIT 2 Thanks to Sven's suggestion, you can try this slightly modified version instead:

ii = np.where(a[:,0] == b.reshape(-1,1))[1]
c = a[ii,:]

It's a bit faster than my original implementation.

EDIT 3 The fastest solution by far (~10x faster than Sven's second solution for large arrays) is:

c = a[np.searchsorted(a[:,0],b),:]

Assuming that a[:,0] is sorted and all values of b appear in a[:,0].

edited Apr 1, 2011 at 19:14

answered Mar 31, 2011 at 19:44

JoshAdel

69.1k27 gold badges146 silver badges146 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

mishaF Over a year ago

Right - that's cool, but in my case, I need to match the values. For example, b is like identifiers, not indices. I will edit the question to clarify that.

Sven Marnach Over a year ago

(a - b) == 0 is the same as a == b, even when broadcasting is involved.

mishaF Over a year ago

@JoshAdel Thanks tons! Luckily, my b array is ints, so I should be OK on the float issue.

Sven Marnach Over a year ago

@Josh: What peeves me about both our answers is that the complexity is O(len(a)*len(b)), where theoretically O((len(a)+len(b))*log(len(b))) would be enough (Sorting b and doing a binary search for every element of a[:,0]). Any ideas how to improve this? Can we use searchsorted()?

JoshAdel Over a year ago

@Sven: Good call - np.searchsorted is easy to apply to this case and is significantly faster

|

Sven Marnach · Accepted Answer · 2011-03-31 21:53:24Z

4

A slightly more concise way to do this is

c = a[(a[:,0] == b[:,None]).any(0)]

The usual caveats for floating point comparisons apply.

Edit: If b is not too small, the following slightly quirky solution performs better:

b.sort()
c = a[b[np.searchsorted(b, a[:, 0]) - len(b)] == a[:,0]]

edited Mar 31, 2011 at 21:53

answered Mar 31, 2011 at 20:14

Sven Marnach

608k123 gold badges968 silver badges865 bronze badges

3 Comments

JoshAdel Over a year ago

And props to Sven: I think his method is ~1.6x faster than my solution.

Sven Marnach Over a year ago

@Josh: Thanks for timing this! You got my +1 anyway for providing a working answer first. :)

JoshAdel Over a year ago

as shown in Edit 3 of my post, you can use searchsorted directly. It's also worth noting that both of your solutions only extract unique entries in b, so if that is important to the OP, than this is also a consideration.

Collectives™ on Stack Overflow

Most efficient way to pull specified rows from a 2-d array?

2 Answers 2

7 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related