0

there are two multidimensional boolean arrays with a different number of rows. I want to quickly find indexes of True values in common rows. I wrote the following code but it is too slow. Is there a faster way to do this?

a=np.random.choice(a=[False, True], size=(100,100))
b=np.random.choice(a=[False, True], size=(1000,100))

for i in a:
    for j in b:
        if np.array_equal(i, j):
          print(np.where(i))
0

2 Answers 2

0

Let's start with an edition to the question that makes sense and usually prints something:

a = np.random.choice(a=[False, True], size=(2, 2))
b = np.random.choice(a=[False, True], size=(4, 2))

print(f"a: \n {a}")
print(f"b: \n {b}")

matches = []
for i, x in enumerate(a):
    for j, y in enumerate(b):
        if np.array_equal(x, y):
            matches.append((i, j))

And the solution using scipy.cdist which compares all rows in a against all rows in b, using hamming distance for Boolean vector comparison:

import numpy as np
import scipy
from scipy import spatial

d = scipy.spatial.distance.cdist(a, b, metric='hamming')
cdist_matches = np.where(d == 0)
mathces_values = [(a[i], b[j]) for (i, j) in matches]
cdist_values = a[cdist_matches[0]], b[cdist_matches[1]]
print(f"matches_inds = \n{matches}")
print(f"matches = \n{mathces_values}")

print(f"cdist_inds = \n{cdist_matches}")
print(f"cdist_matches =\n {cdist_values}")

out:

a: 
 [[ True False]
 [False False]]
b: 
 [[ True  True]
 [ True False]
 [False False]
 [False  True]]
matches_inds = 
[(0, 1), (1, 2)]
matches = 
[(array([ True, False]), array([ True, False])), (array([False, False]), array([False, False]))]
cdist_inds = 
(array([0, 1], dtype=int64), array([1, 2], dtype=int64))
cdist_matches =
 (array([[ True, False],
       [False, False]]), array([[ True, False],
       [False, False]]))


See this for a pure numpy implementation if you don't want to import scipy

Sign up to request clarification or add additional context in comments.

1 Comment

@Gulzar I added a solution below, making a broadcastable to b and comparing each row of a to each row of b, if I have understood the question correctly
0

The comparision of each row of a to each row of b can be made by making the shape of a broadcastable to the shape of b with the use of np.newaxis and np.tile

import numpy as np

a=np.random.choice(a=[True, False], size=(2,5))
b=np.random.choice(a=[True, False], size=(10,5))
broadcastable_a = np.tile(a[:, np.newaxis, :], (1, b.shape[0], 1))
a_equal_b = np.equal(b, broadcastable_a)
indexes = np.where(a_equal_b)
indexes = np.stack(np.array(indexes[1:]), axis=1)

1 Comment

I think it won't work because it only compares b as blocks and not by row. Maybe I didn't understand correctly. Please also add the code to convert back from the result of .where to the required indices. Also please show output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.