Get NumPy Array Indices in Array B for Unique Values in Array A, for Values Present in Both Arrays, Aligned with Array A

Question

I have two NumPy arrays:

A = asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = asarray(['2', '4', '8', '16', '32'])

I want a function that takes A, B as parameters, and returns the index in B for each value in A, aligned with A, as efficiently as possible.

These are the outputs for the test case above:

indices = [1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

I've tried exploring in1d(), where(), and nonzero() with no luck. Any help is much appreciated.

Edit: Arrays are strings.

Daniel · Accepted Answer · 2013-07-10 22:17:44Z

3

You can also do:

>>> np.digitize(A,B)-1
array([1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4])

According to the docs you should be able to specify right=False and skip the minus one part. This does not work for me, likely due to a version issue as I do not have numpy 1.7.

Im not sure what you are doing with this, but a simple and very fast way to do this is:

>>> A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
>>> B,indices=np.unique(A,return_inverse=True)
>>> B
array(['16', '2', '32', '4', '8'],
      dtype='|S2')
>>> indices
array([3, 3, 1, 4, 4, 4, 4, 4, 0, 2, 0, 0, 2])

>>> B[indices]
array(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'],
      dtype='|S2')

The order will be different, but this can be changed if needed.

edited Jul 10, 2013 at 22:17

answered Jul 10, 2013 at 16:34

Daniel

19.6k7 gold badges64 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jaime Over a year ago

You are implicitly relying in B being sorted.

Jaime Over a year ago

But other than that, which is easily solved, e.g. as in my answer, this is faster than np.searchsorted, so +1.

Will Over a year ago

Let me further complicate matters by saying A and B are arrays of strings :( Apparently digitize() doesn't like.

ovgolovin · Accepted Answer · 2013-07-10 09:56:45Z

1

For such things it is important to have lookups in B as fast as possible. Dictionary provides O(1) lookup time. So, first of all, let us construct this dictionary:

>>> indices = dict((value,index) for index,value in enumerate(B))
>>> indices
{8: 2, 16: 3, 2: 0, 4: 1, 32: 4}

And then just go through A and find corresponding indices:

>>> [indices[item] for item in A]
[1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

answered Jul 10, 2013 at 9:56

ovgolovin

13.5k7 gold badges53 silver badges80 bronze badges

2 Comments

Will Over a year ago

Thanks, this is great. But, is there any way to do it in NumPy-C-happy-land? {dict: comprehension} seems a bit faster as well if we went with this route. Is there no nice NumPy way to do it without having to pass a dict around?

ovgolovin Over a year ago

@Will If B is large, it's important to have O(1) lookup complexity. I'm not familiar with numpy, but perfunctory search didn't yield any references to dict analogs in numpy. If B is small, it may be faster to do everything inside numpy. If so, wait for another answers, may be someone will be able to come up with all-in-numpy solution.

Jaime · Accepted Answer · 2013-07-10 16:58:45Z

1

I think you can do it with np.searchsorted:

>>> A = asarray([4, 4, 2, 8, 8, 8, 8, 8, 16, 32, 16, 16, 32])
>>> B = asarray([2, 8, 4, 32, 16])
>>> sort_b = np.argsort(B)
>>> idx_of_a_in_sorted_b = np.searchsorted(B, A, sorter=sort_b)
>>> idx_of_a_in_b = np.take(sort_b, idx_of_a_in_sorted_b)
>>> idx_of_a_in_b
array([2, 2, 0, 1, 1, 1, 1, 1, 4, 3, 4, 4, 3], dtype=int64)

Note that B is scrambled from your version, thus the different output. If some of the items in A are not in B (which you could check with np.all(np.in1d(A, B))) then the return indices for those values will be crap, and you may even get an IndexError from the last line (if the largest value in A is missing from B).

edited Jul 10, 2013 at 16:58

answered Jul 10, 2013 at 13:20

Jaime

67.7k19 gold badges128 silver badges164 bronze badges

Comments

Eelco Hoogendoorn · Accepted Answer · 2016-04-02 20:48:46Z

1

The numpy_indexed package (disclaimer: I am its author) implements a solution along the same lines as Jaime's solution; but with a nice interface, tests, and a lot of related useful functionality:

import numpy_indexed as npi
print(npi.indices(B, A))

edited Apr 2, 2016 at 20:48

answered Apr 2, 2016 at 15:26

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

3 Comments

Mogsdad Over a year ago

You keep posting almost identical answers pointing at your utility, not being clear about your affiliation to the linked repo. To keep them from getting flagged as spam, you should take the steps described in: How can I link to an external resource in a community-friendly way?

Eelco Hoogendoorn Over a year ago

Thanks for the heads-up, but are you sure these linked conditions apply? This isn't a 'product or website' I am linking, but rather an open-source project. Mentioning my authorship under those circumstances feels more like self-promotion than useful information.

Eelco Hoogendoorn Over a year ago

Based on similar feedback I have decided to add a disclaimer; thanks again.

rtrwalker · Accepted Answer · 2013-07-12 00:58:10Z

0

I'm not sure how efficient this is but it works:

import numpy as np
A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = np.asarray(['2', '4', '8', '16', '32'])
idx_of_a_in_b=np.argmax(A[np.newaxis,:]==B[:,np.newaxis],axis=0)
print(idx_of_a_in_b)

from which I get:

[1 1 0 2 2 2 2 2 3 4 3 3 4]

answered Jul 12, 2013 at 0:58

rtrwalker

1,0216 silver badges15 bronze badges

2 Comments

Will Over a year ago

This seems to be the one! Thanks!

Eelco Hoogendoorn Over a year ago

Note: this solution is quadratic in terms of the input side, which is not ideal.

Collectives™ on Stack Overflow

Get NumPy Array Indices in Array B for Unique Values in Array A, for Values Present in Both Arrays, Aligned with Array A

5 Answers 5

3 Comments

2 Comments

Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

2 Comments

Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related