Efficiently find row intersections of two 2-D numpy arrays

Question

I am trying to figure out an efficient way of finding row intersections of two np.arrays.

Two arrays have the same shapes, and duplicate values in each row cannot happen.

For example:

import numpy as np

a = np.array([[2,5,6],
              [8,2,3],
              [4,1,5],
              [1,7,9]])

b = np.array([[2,3,4],  # one element(2) in common with a[0] -> 1
              [7,4,3],  # one element(3) in common with a[1] -> 1
              [5,4,1],  # three elements(5,4,1) in common with a[2] -> 3
              [7,6,9]]) # two element(9,7) in common with a[3] -> 2

My desired output is : np.array([1,1,3,2])

It is easy to do this with a loop:

def get_intersect1ds(a, b):
    result = np.empty(a.shape[0], dtype=np.int)
    for i in xrange(a.shape[0]):
        result[i] = (len(np.intersect1d(a[i], b[i])))
    return result

Result:

>>> get_intersect1ds(a, b)
array([1, 1, 3, 2])

But is there a more efficient way to do it?

@WarrenWeckesser, 4,000,000 by 25 and I probably would have do this operation a lot. — Akavall
– Akavall, Commented Nov 1, 2013 at 17:12

Jaime · Accepted Answer · 2013-11-01 18:57:51Z

7

If you have no duplicates within a row you can try to replicate what np.intersect1d does under the hood (see the source code here):

>>> c = np.hstack((a, b))
>>> c
array([[2, 5, 6, 2, 3, 4],
       [8, 2, 3, 7, 4, 3],
       [4, 1, 5, 5, 4, 1],
       [1, 7, 9, 7, 6, 9]])
>>> c.sort(axis=1)
>>> c
array([[2, 2, 3, 4, 5, 6],
       [2, 3, 3, 4, 7, 8],
       [1, 1, 4, 4, 5, 5],
       [1, 6, 7, 7, 9, 9]])
>>> c[:, 1:] == c[:, :-1]
array([[ True, False, False, False, False],
       [False,  True, False, False, False],
       [ True, False,  True, False,  True],
       [False, False,  True, False,  True]], dtype=bool)
>>> np.sum(c[:, 1:] == c[:, :-1], axis=1)
array([1, 1, 3, 2])

answered Nov 1, 2013 at 18:57

Jaime

67.7k19 gold badges128 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Elad Maimoni Over a year ago

Can you explain the algorithm behind the line c[:, 1:] == c[:, :-1]?

Warren Weckesser · Accepted Answer · 2013-11-01 17:33:10Z

2

This answer might not be viable, because if the input has shape (N, M), it generates an intermediate array with size (N, M, M), but it's always fun to see what you can do with broadcasting:

In [43]: a
Out[43]: 
array([[2, 5, 6],
       [8, 2, 3],
       [4, 1, 5],
       [1, 7, 9]])

In [44]: b
Out[44]: 
array([[2, 3, 4],
       [7, 4, 3],
       [5, 4, 1],
       [7, 6, 9]])

In [45]: (np.expand_dims(a, -1) == np.expand_dims(b, 1)).sum(axis=-1).sum(axis=-1)
Out[45]: array([1, 1, 3, 2])

For large arrays, the method could be made more memory-friendly by doing the operation in batches.

edited Nov 1, 2013 at 17:33

answered Nov 1, 2013 at 17:24

Warren Weckesser

116k20 gold badges207 silver badges224 bronze badges

Comments

shx2 · Accepted Answer · 2013-11-01 17:17:53Z

1

I can't think of a clean pure-numpy solution, but the following suggestion should speed things up, potentially dramatically:

use numba. It is as simple as decorating your get_intersect1ds function with @autojit
pass assume_unique = True when you call intersect1d

answered Nov 1, 2013 at 17:17

shx2

64.8k17 gold badges139 silver badges166 bronze badges

1 Comment

Akavall Over a year ago

Unfortunately, I don't have access to numba, but I was thinking cython. I think it should work as well. Thanks for the suggestion.

Collectives™ on Stack Overflow

Efficiently find row intersections of two 2-D numpy arrays

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related