22

I have an array of integers and want to find where that array is equal to any value in a list of multiple values.

This can easily be done by treating each value individually, or by using multiple "or" statements in a loop, but I feel like there must be a better/faster way to do it. I'm actually dealing with arrays of size 4000 x 2000, but here is a simplified edition of the problem:

fake = arange(9).reshape((3,3))

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

want = (fake==0) + (fake==2) + (fake==6) + (fake==8)

print want 

array([[ True, False,  True],
       [False, False, False],
       [ True, False,  True]], dtype=bool)

What I would like is a way to get want from a single command involving fake and the list of values [0, 2, 6, 8].

I'm assuming there is a package that has this included already that would be significantly faster than if I just wrote a function with a loop in Python.

3 Answers 3

21

The function numpy.in1d seems to do what you want. The only problems is that it only works on 1d arrays, so you should use it like this:

In [9]: np.in1d(fake, [0,2,6,8]).reshape(fake.shape)
Out[9]: 
array([[ True, False,  True],
       [False, False, False],
       [ True, False,  True]], dtype=bool)

I have no clue why this is limited to 1d arrays only. Looking at its source code, it first seems to flatten the two arrays, after which it does some clever sorting tricks. But nothing would stop it from unflattening the result at the end again, like I had to do by hand here.

Sign up to request clarification or add additional context in comments.

2 Comments

Hmm. I wrote this very simple function to do this job: def EqualsAny(ar,vals): out=zeros(ar.shape,dtype=bool) for val in vals: out+=(ar==val) return out I thought that numpy.in1d would be faster, but it actually takes longer (for same result): In [11]: %timeit EqualsAny(badlabels,smallnum) 1 loops, best of 3: 519 ms per loop In [7]: %timeit in1d(badlabels, smallnum).reshape(badlabels.shape) 1 loops, best of 3: 871 ms per loop Shouldn't numpy.in1d be way faster since it's written in C? Am I not using %timeit properly?
No, in1d is not written in c but in python, see the link to the source code I gave. It uses various numpy functions like sort, which should hopefully be written in C. It even has some optimized algorithm for when vals is short, which is pretty similar to your solution (but with |= in stead of +=). I don't know why your version is faster, this might depend on the length of both inputs.
17

NumPy 0.13+

As of NumPy v0.13, you can use np.isin, which works on multi-dimensional arrays:

>>> element = 2*np.arange(4).reshape((2, 2))
>>> element
array([[0, 2],
       [4, 6]])
>>> test_elements = [1, 2, 4, 8]
>>> mask = np.isin(element, test_elements)
>>> mask
array([[ False,  True],
       [ True,  False]])

NumPy pre-0.13

The accepted answer with np.in1d works only with 1d arrays and requires reshaping for the desired result. This is good for versions of NumPy before v0.13.

Comments

5

@Bas's answer is the one you're probably looking for. But here's another way to do it, using numpy's vectorize trick:

import numpy as np
S = set([0,2,6,8])

@np.vectorize
def contained(x):
    return x in S

contained(fake)
=> array([[ True, False,  True],
          [False, False, False],
          [ True, False,  True]], dtype=bool)

The con of this solution is that contained() is called for each element (i.e. in python-space), which makes this much slower than a pure-numpy solution.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.