3

I look for an efficient way to get a row-wise intersection of two two-dimensional numpy ndarrays. There is only one intersection per row. For example:

[[1, 2], ∩ [[0, 1], -> [1,
 [3, 4]]    [0, 3]]     3]

In the best case zeros should be ignored:

[[1, 2, 0], ∩ [[0, 1, 0], -> [1,
 [3, 4, 0]]    [0, 3, 0]]     3]

My solution:

import numpy as np

arr1 = np.array([[1, 2],
                 [3, 4]])
arr2 = np.array([[0, 1],
                 [0, 3]])
arr3 = np.empty(len(arr1))

for i in range(len(arr1)):
    arr3[i] = np.intersect1d(arr1[i], arr2[i])

print(arr3)
# [ 1.  3.]

I have about 1 million rows, so the vectorized operations are most preferred. You are welcome to use other python packages.

3
  • 1
    If you found a solution (which looks right, in my humble opinion, and also vectorized), please post it as a solution and "accept" it so it's visible that the question has an accepted answer. Also, the pandas and scipy tags aren't relevant here. Commented Jul 2, 2019 at 10:34
  • You don't have to use a loop here, you can just transpose the arrays: np.intersect1d(arr1.transpose(),arr2.transpose()).transpose() Commented Jul 2, 2019 at 10:36
  • @ItamarMushkin Vectorization allows to execute similar operations simultaneously on a bunch of data. Your solution has the for loop so it's executed line by line. I accept solution which use scipy and pandas packages as well. Try to read documentation for the intersect1d function. It is a coincidence that you get the same result. Try these arrays [[1,2],[3,4]] and [[3,4],[1,2]]. Commented Jul 2, 2019 at 13:13

2 Answers 2

2

You can use np.apply_along_axis. I wrote a solution that pads to the size of the arr1. Didn't test the efficiency.

    import numpy as np

    def intersect1d_padded(x):
        x, y = np.split(x, 2)
        padded_intersection = -1 * np.ones(x.shape, dtype=np.int)
        intersection = np.intersect1d(x, y)
        padded_intersection[:intersection.shape[0]] = intersection
        return padded_intersection

    def rowwise_intersection(a, b):
        return np.apply_along_axis(intersect1d_padded,
                        1, np.concatenate((a, b), axis=1))

    result = rowwise_intersection(arr1,arr2)

    >>> array([[ 1, -1],
               [ 3, -1]])

if you know you have only one element in the intersection you can use

    result = rowwise_intersection(arr1,arr2)[:,0]

    >>> array([1, 3])

You can also modify intersect1d_padded to return a scalar with the intersection value.

Sign up to request clarification or add additional context in comments.

Comments

0

I don't know of an elegant way to do it in numpy, but a simple list comprehension can do the trick:

[list(set.intersection(set(_x),set(_y)).difference({0})) for _x,_y in zip(x,y)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.