17

how do I get a row-wise comparison between two arrays, in the result of a row-wise true/false array?

Given datas:

a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])

Result step 1:

c = np.array([True, True,False,True])

Result final:

a = a[c]

So how do I get the array c ????

P.S.: In this example the arrays a and b are sorted, please give also information if in your solution it is important that the arrays are sorted

0

7 Answers 7

25

Here's a vectorised solution:

res = (a[:, None] == b).all(-1).any(-1)

print(res)

array([ True,  True, False,  True])

Note that a[:, None] == b compares each row of a with b element-wise. We then use all + any to deduce if there are any rows which are all True for each sub-array:

print(a[:, None] == b)

[[[ True  True]
  [False  True]
  [False False]]

 [[False  True]
  [ True  True]
  [False False]]

 [[False False]
  [False False]
  [False False]]

 [[False False]
  [False False]
  [ True  True]]]
Sign up to request clarification or add additional context in comments.

2 Comments

this looks good a = np.array([[1,0],[2,0],[4,2],[3,1],[3,0]]) b = np.array([[1,0],[2,0],[3,1]]) c = (a[:, None] == b).all(-1).any(-1) result [ True True False True False]
6

Approach #1

We could use a view based vectorized solution -

# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

A,B = view1D(a,b)
out = np.isin(A,B)

Sample run -

In [8]: a
Out[8]: 
array([[1, 0],
       [2, 0],
       [3, 1],
       [4, 2]])

In [9]: b
Out[9]: 
array([[1, 0],
       [2, 0],
       [4, 2]])

In [10]: A,B = view1D(a,b)

In [11]: np.isin(A,B)
Out[11]: array([ True,  True, False,  True])

Approach #2

Alternatively for the case when all rows in b are in a and rows are lexicographically sorted, using the same views, but with searchsorted -

out = np.zeros(len(A), dtype=bool)
out[np.searchsorted(A,B)] = 1

If the rows are not necessarily lexicographically sorted -

sidx = A.argsort()
out[sidx[np.searchsorted(A,B,sorter=sidx)]] = 1

Comments

6

you can use numpy with apply_along_axis (kind of iteration on specific axis while axis=0 iterate on every cell, axis=1 iterate on every row, axis=2 on matrix and so on

import numpy as np
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
c = np.apply_along_axis(lambda x,y: x in y, 1, a, b)

2 Comments

This doesn't actually use np.isin, bit confused why you mentioned it, as I don't think it's particularly useful here.
seems not to work in order to check for identical rows: a = np.array([[1,0],[2,0],[4,2],[3,1],[3,0]]) b = np.array([[1,0],[2,0],[3,1]]) c = np.apply_along_axis(lambda x,y: x in y, 1, a, b) result is [ True True False True True] the last one should be false
2

You can do it as a list comp via:

c = np.array([row in b for row in a])

though this approach will be slower than a pure numpy approach (if it exists).

Comments

2

You can use scipy's cdist which has a few advantages:

from scipy.spatial.distance import cdist

a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])

c = cdist(a, b)==0
print(c.any(axis=1))
[ True  True False  True]
print(a[c.any(axis=1)])
[[1 0]
 [2 0]
 [4 2]]

Also, cdist allows passing of a function pointer. So you can specify your own distance functions, to do whatever comparison you need:

c = cdist(a, b, lambda u, v: (u==v).all())
print(c)
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 0.]
 [0. 0. 1.]]

And now you can find which index matches. Which will also indicate if there are multiple matches.

# Array with multiple instances
a2 = np.array([[1,0],[2,0],[3,1],[4,2],[3,1],[4,2]])

c2 = cdist(a2, b, lambda u, v: (u==v).all())
print(c2)

idx = np.where(c2==1)
print(idx)

print(idx[0][idx[1]==2])
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 0.]
 [0. 0. 1.]
 [0. 0. 0.]
 [0. 0. 1.]]
(array([0, 1, 3, 5], dtype=int64), array([0, 1, 2, 2], dtype=int64))
[3 5]

1 Comment

Interesting approach. I wonder how it fares in terms of performance
1
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])

i = 0
j = 0
result = []

We can take advantage of the fact that they are sorted and do this in O(n) time. Using two pointers we just move ahead the pointer that has gotten behind:

while i < len(a) and j < len(b):
    if tuple(a[i])== tuple(b[j]):
        result.append(True)
        i += 1
        j += 1 # get rid of this depending on how you want to handle duplicates
    elif tuple(a[i]) > tuple(b[j]):
        j += 1
    else:
        result.append(False)
        i += 1

Pad with False if it ends early.

if len(result) < len(a):
    result.extend([False] * (len(a) - len(result)))

print(result) # [True, True, False, True]

This answer is adapted from Better way to find matches in two sorted lists than using for loops? (Java)

Comments

1

The recommended answer is good, but will struggle when dealing with arrays with a large number of rows. An alternative is:

baseval = np.max([a.max(), b.max()]) + 1
a[:,1] = a[:,1] * baseval
b[:,1] = b[:,1] * baseval
c = np.isin(np.sum(a, axis=1), np.sum(b, axis=1))

This uses the maximum value contained in either array plus 1 as a numeric base and treats the columns as baseval^0 and baseval^1 values. This ensures that the sum of the columns are unique for each possible pair of values. If the order of the columns is not important then both input arrays can be sorted column-wise using np.sort(a,axis=1) beforehand.

This can be extended to arrays with more columns using:

baseval = np.max([a.max(), b.max()]) + 1
n_cols = a.shape[1]
a = a * baseval ** np.array(range(n_cols))
b = b * baseval ** np.array(range(n_cols))
c = np.isin(np.sum(a, axis=1), np.sum(b, axis=1))

Overflow can occur when baseval ** (n_cols+1) > 9223372036854775807 if using int64. This can be avoided by setting the numpy arrays to use python integers using dtype=object.

1 Comment

You need ** not * I think.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.