2

I have a (huge) 2D array. For example:

a=[[1,2],[2,3],[4,5]]

I need to extract from it the elements that satisfy certain conditions

a[:,0]>1 and a[:,1]>2

such that I get in return an array with only elements that satisfy both the conditions

[[2,3],[4,5]]

(I need to further use that in a loop, which might or might not be relevant to the question)

I have tried the following:

np.transpose([np.extract(a[:,0]>1,a[:,0]),np.extract(a[:,1]>2,a[:,1])])

The above works only when the both the extracted array are of same length. Even when it works, it sometimes returns pairs that weren't paired together to begin with (I understand why)

I know how to do it in lists

list(filter(lambda b: b[0]>1 and b[1]>2,a))

However, I want to improve the efficiency. So I am shifting towards numpy (since I've read it is generally more efficient?) Is there any way to do the above in numpy that is significantly faster than lists? (I would be executing that piece of code 1000s times using array with 100s of elements.)

Update: Following Maarten_vd_Sande's answer:

The following code was used to check the time taken:

import numpy as np
import time

b=np.random.rand(10000000,2)
a=b.tolist()
strt=time.time()
c=b[np.logical_and(b[:,0]>0.5,b[:,1]>0.5)]
for (i,j) in c:
    continue
print("Numpy= ",time.time()-strt)
strt=time.time()
for (i,j) in list(filter(lambda m: m[0]>0.5 and m[1]>0.5,a)):
    continue
print("List= ",time.time()-strt)

Output:

Numpy=  2.973170042037964
List=  1.91910982131958
8
  • In the lambda function you check for >2 which results in an empty list. Change it to >0.5 and the numpy approach is double as fast (and more than 10x as fast if you remove the empty loop). Commented Mar 30, 2018 at 8:19
  • Changed it to >0.5. But Numpy is still slower. Did you make any other changes? Commented Mar 30, 2018 at 8:30
  • It is the looping part that makes numpy slower, not the actual filter (try with outcommenting the loop). Maybe you can vectorize what happens in the loop. Is the 2 seconds the empty loop takes the bottleneck, or the computation inside of it? Commented Mar 30, 2018 at 8:38
  • Tried without the loop. Numpy is indeed an order of magnitude faster as you had mentioned earlier. However, the loop is necessary, as inside it, whichever element enters, I need to essentially find the distance between the point (i,j) and some other point (x,y). It'd seem the numpy loop is slightly better than list loop (1.963 vs 2.0201) if i convert c from numpy array to list using to.list() before the loop. Commented Mar 30, 2018 at 8:49
  • 1
    No idea what happened earlier. It did show different answers without me touching the array. But now it is showing the same. Will try to figure out where it or I messed up. Anyways, thanks for all the help! Commented Mar 30, 2018 at 10:43

1 Answer 1

1

You need to make use of the (logical) and in numpy:

result = a[np.logical_and(a[:,0] > 1,  a[:,1] > 2)]

Does this work for you?

edit: In this case we can even make good use of broadcasting:

result = np.greater(a, [1, 2])
Sign up to request clarification or add additional context in comments.

3 Comments

The logical_and works like a charm! Thanks. However, it is not nearly as fast as the list method. Not sure if it is something I wrote or the numpy method itself isn't as efficient. Should I post the code used for checking the time here or make another question?
That is surprising, you can post it here I guess. I'll take a look.
Couldn't get the formatting in the comments right, so I've edited the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.