2

I have two NumPy arrays:

A = asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = asarray(['2', '4', '8', '16', '32'])

I want a function that takes A, B as parameters, and returns the index in B for each value in A, aligned with A, as efficiently as possible.

These are the outputs for the test case above:

indices = [1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

I've tried exploring in1d(), where(), and nonzero() with no luck. Any help is much appreciated.

Edit: Arrays are strings.

5 Answers 5

3

You can also do:

>>> np.digitize(A,B)-1
array([1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4])

According to the docs you should be able to specify right=False and skip the minus one part. This does not work for me, likely due to a version issue as I do not have numpy 1.7.

Im not sure what you are doing with this, but a simple and very fast way to do this is:

>>> A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
>>> B,indices=np.unique(A,return_inverse=True)
>>> B
array(['16', '2', '32', '4', '8'],
      dtype='|S2')
>>> indices
array([3, 3, 1, 4, 4, 4, 4, 4, 0, 2, 0, 0, 2])

>>> B[indices]
array(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'],
      dtype='|S2')

The order will be different, but this can be changed if needed.

Sign up to request clarification or add additional context in comments.

3 Comments

You are implicitly relying in B being sorted.
But other than that, which is easily solved, e.g. as in my answer, this is faster than np.searchsorted, so +1.
Let me further complicate matters by saying A and B are arrays of strings :( Apparently digitize() doesn't like.
1

For such things it is important to have lookups in B as fast as possible. Dictionary provides O(1) lookup time. So, first of all, let us construct this dictionary:

>>> indices = dict((value,index) for index,value in enumerate(B))
>>> indices
{8: 2, 16: 3, 2: 0, 4: 1, 32: 4}

And then just go through A and find corresponding indices:

>>> [indices[item] for item in A]
[1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

2 Comments

Thanks, this is great. But, is there any way to do it in NumPy-C-happy-land? {dict: comprehension} seems a bit faster as well if we went with this route. Is there no nice NumPy way to do it without having to pass a dict around?
@Will If B is large, it's important to have O(1) lookup complexity. I'm not familiar with numpy, but perfunctory search didn't yield any references to dict analogs in numpy. If B is small, it may be faster to do everything inside numpy. If so, wait for another answers, may be someone will be able to come up with all-in-numpy solution.
1

I think you can do it with np.searchsorted:

>>> A = asarray([4, 4, 2, 8, 8, 8, 8, 8, 16, 32, 16, 16, 32])
>>> B = asarray([2, 8, 4, 32, 16])
>>> sort_b = np.argsort(B)
>>> idx_of_a_in_sorted_b = np.searchsorted(B, A, sorter=sort_b)
>>> idx_of_a_in_b = np.take(sort_b, idx_of_a_in_sorted_b)
>>> idx_of_a_in_b
array([2, 2, 0, 1, 1, 1, 1, 1, 4, 3, 4, 4, 3], dtype=int64)

Note that B is scrambled from your version, thus the different output. If some of the items in A are not in B (which you could check with np.all(np.in1d(A, B))) then the return indices for those values will be crap, and you may even get an IndexError from the last line (if the largest value in A is missing from B).

Comments

1

The numpy_indexed package (disclaimer: I am its author) implements a solution along the same lines as Jaime's solution; but with a nice interface, tests, and a lot of related useful functionality:

import numpy_indexed as npi
print(npi.indices(B, A))

3 Comments

You keep posting almost identical answers pointing at your utility, not being clear about your affiliation to the linked repo. To keep them from getting flagged as spam, you should take the steps described in: How can I link to an external resource in a community-friendly way?
Thanks for the heads-up, but are you sure these linked conditions apply? This isn't a 'product or website' I am linking, but rather an open-source project. Mentioning my authorship under those circumstances feels more like self-promotion than useful information.
Based on similar feedback I have decided to add a disclaimer; thanks again.
0

I'm not sure how efficient this is but it works:

import numpy as np
A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = np.asarray(['2', '4', '8', '16', '32'])
idx_of_a_in_b=np.argmax(A[np.newaxis,:]==B[:,np.newaxis],axis=0)
print(idx_of_a_in_b)

from which I get:

[1 1 0 2 2 2 2 2 3 4 3 3 4]

2 Comments

This seems to be the one! Thanks!
Note: this solution is quadratic in terms of the input side, which is not ideal.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.