18

I have two arrays, a1 and a2. Assume len(a2) >> len(a1), and that a1 is a subset of a2.

I would like a quick way to return the a2 indices of all elements in a1. The time-intensive way to do this is obviously:

from operator import indexOf
indices = []
for i in a1:
    indices.append(indexOf(a2,i))

This of course takes a long time where a2 is large. I could also use numpy.where() instead (although each entry in a1 will appear just once in a2), but I'm not convinced it will be quicker. I could also traverse the large array just once:

for i in xrange(len(a2)):
    if a2[i] in a1:
        indices.append(i)

But I'm sure there is a faster, more 'numpy' way - I've looked through the numpy method list, but cannot find anything appropriate.

Many thanks in advance,

D

6 Answers 6

18

How about

numpy.nonzero(numpy.in1d(a2, a1))[0]

This should be fast. From my basic testing, it's about 7 times faster than your second code snippet for len(a2) == 100, len(a1) == 10000, and only one common element at index 45. This assumes that both a1 and a2 have no repeating elements.

Sign up to request clarification or add additional context in comments.

3 Comments

I compared your solution to Dave Kirby's above, with this one being approx 1.35X faster for len(a2) == 12347424, len(a1) == 1338, so this solution get's my vote - thanks!
For anyone reading this: it seems like setmember1d has been renamed to in1d since numpy 1.4.
@DanielWatkins Since numpy 1.4 is very old now, I have updated my answer to use in1d.
2

how about:

wanted = set(a1)
indices =[idx for (idx, value) in enumerate(a2) if value in wanted]

This should be O(len(a1)+len(a2)) instead of O(len(a1)*len(a2))

NB I don't know numpy so there may be a more 'numpythonic' way to do it, but this is how I would do it in pure python.

1 Comment

should that be enumerate(a2)?
1
index = in1d(a2,a1)
result = a2[index]

Comments

1

Very similar to @AlokSinghal, but you get an already flattened version.

numpy.flatnonzero(numpy.in1d(a2, a1))

Comments

0

The numpy_indexed package (disclaimer: I am its author) contains a vectorized equivalent of list.index; performance should be similar to the currently accepted answer, but as a bonus, it gives you explicit control over missing values as well, using the 'missing' kwarg.

import numpy_indexed as npi
indices = npi.indices(a2, a1, missing='raise')

Also, it will also work on multi-dimensional arrays, ie, finding the indices of one set of rows in another.

Comments

0

These all methods are slow for me. Following method is doing quite fast. The index list has the index of the elements from first list which are common in second list.

index=[]
d={}
for j in range(len(first_list)):
    name=first_list[j]
    d[name]=j
    
for i in range(len(second_list)):
    name=second_list[i]
    index.append(d[name])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.