3

Given two lists:

In [518]: A
Out[518]: [3, 4, 2, 1, 7, 6, 5]

In [519]: B
Out[519]: [4, 6]

Every element in B exists in A, without exception.

I'd like to retrieve an array of indexes for B as seen in A. For example, 4 is present in index 1 in A, and 6 is in position 5 for B. My expected output is [1, 5] for this scenario.

This is what I did to get the index:

In [520]: np.flatnonzero(np.in1d(a, b))
Out[520]: array([1, 5])

Unfortunately, this won't work in most other cases. For example, if B = [6, 4], my method still outputs [1, 5] when it should output [5, 1].

Is there an efficient numpy way to get what I'm trying to achieve?

11
  • 1
    Fairly sure this has come up before? What about duplicates? Commented Oct 21, 2017 at 10:29
  • The help for np.where gives this exact example: ix = np.in1d(A.ravel(), B).reshape(A.shape); np.where(ix). Sorry, doesn't match your second criterion. Commented Oct 21, 2017 at 10:30
  • 1
    @p-robot yes, and besides, my flatnonzero method is a little nicer. ;-) Commented Oct 21, 2017 at 10:32
  • 1
    @Asterisk Yes, this question was inspired by stackoverflow.com/questions/46862148/… and a slight modification of Martijn Pieters' answer will give me what I want in python. My question is more out of curiosity for the "how". Commented Oct 21, 2017 at 10:37
  • 1
    @cᴏʟᴅsᴘᴇᴇᴅ ah ha! I knew I'd seen those A and Bs before :p Commented Oct 21, 2017 at 10:42

3 Answers 3

2

IIUC:

In [71]: a
Out[71]: array([3, 4, 2, 1, 7, 6, 5, 6, 4])

In [72]: b
Out[72]: array([4, 6])

In [73]: np.where(a==b[:,None])[1]
Out[73]: array([1, 8, 5, 7], dtype=int64)

In [74]: b = np.array([6, 4])

In [75]: np.where(a==b[:,None])[1]
Out[75]: array([5, 7, 1, 8], dtype=int64)

UPDATE: if you need only indices of first occurances (in case there are duplicates in A array), then use this solution from @Divakar, which will be faster:

In [84]: (a==b[:,None]).argmax(1)
Out[84]: array([5, 1], dtype=int64)
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for your response! I'm looking for [1, 5] in the first instance and [5, 1] in the second. Your answer seems to be getting there but not quite there :-)
@cᴏʟᴅsᴘᴇᴇᴅ, i've changed your a array ;-)
Oh my, I didn't realise. Yes, that's exactly what I'm looking for! Thanks so much!
I guess you would need (a==b[:,None]).argmax(1) if just the first instance is needed.
@Divakar, yeah, with that additional condition i totally agree ;-)
|
0

I don't know if it is efficient but

[int(np.isin(A, B[x]).nonzero()[0]) for x in range(len(B))]

seems to fit the bill. If uniqueness is not guaranteed then the int() part can be removed

1 Comment

Truthfully, I thought of this myself, but I wanted something a little less... loopy.
0

If m=A.size and n=B.size the where approach is O(mn) . You can stay in O((m+n)log(m+n)) by carefully sort in1d output (with unique values here):

A= np.unique(np.random.randint(0,100000,100000))
np.random.shuffle(A)
B=np.unique(np.random.randint(0,10000,10000))
np.random.shuffle(B)

def find(A,B):
    pos=np.in1d(A,B).nonzero()[0]
    return pos[A[pos].argsort()][B.argsort().argsort()]

In [5]: np.allclose(np.where(np.equal.outer(B,A))[1],find(A,B))
Out[5]: True

In [6]: %time np.where(np.equal.outer(B,A))[1]
Wall time: 3.98 s
Out[6]: array([88220, 13472, 12482, ...,  9795, 39524,  5727], dtype=int64)

In [7]: %timeit find(A,B)
22.6 ms ± 366 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.