Numpy filtering based on all row values

Question

I'm trying to filter a 2D numpy array with another 2D numpy arrays values. Something like this:

array1 = np.array([[ 0,  0],
                   [86,  4],
                   [75, 74],
                   [78, 55],
                   [53, 94],
                   [49, 83],
                   [99, 75],
                   [99, 10],
                   [32,  4],
                   [55, 99],
                   [62, 95],
                   [ 0,  0]])
array2 = np.array([[55, 99],
                   [32,  4],
                   [75, 74]])
array1[np.isin(array1, array2[2:5]).all(axis=1) == 0]

My ideal output would be a filtered version of array1 that does not have the rows which are equal to the ones in the array2 slice. Problem is when i do it like this:

np.isin(array1, array[2:5])

output is:

array([[False, False],
   [False,  True],
   [ True,  True],
   [False,  True],
   [False, False],
   [False, False],
   [ True,  True],
   [ True, False],
   [ True,  True],
   [ True,  True],
   [False, False],
   [False, False]])

It wrongly classifies [99,75] row as [True, True] because both of those values individually exist in our array2. Is there a more correct way to filter based on all values of a row?

jammygrams · Accepted Answer · 2019-03-05 01:41:03Z

1

Here's an inefficient but very explicit way to do this with np.all():

# for each row in array2, check full match with each row in array1
bools = [np.all(array1==row,axis=1) for row in array2]

# combine 3 boolean arrays with 'or' logic
mask = [any(tup) for tup in zip(*bools)]

# flip the mask
mask = ~np.array(mask)

# final index
out = array1[mask]

edited Mar 5, 2019 at 1:41

answered Mar 3, 2019 at 14:59

jammygrams

4081 gold badge4 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

alparslan mimaroğlu Over a year ago

I was trying to implement the ordered crossover algorithm. This works if all of your rows are unique. If anyone tries to do the same make sure all of your rows are unique too.

jammygrams Over a year ago

Sorry I think i don't quite understand then - the above code should work if there are duplicates in array1 or array2. Can you share a bit more about the problem if rows are not unique?

alparslan mimaroğlu Over a year ago

Your solution works perfectly. Ordered crossover is a method used in genetic algorithms. For example these can be your parents A = [a,b,c,a,c,b], B=[a,c,b,c,a,b]. To create a child you can use ordered crossover. You take a random slice.(i.e. 2-4). From A you take the 2-4 slice. Child = [,,c,a,,]. You delete the first occurrences of c and a from B and put the remaining genes to child in the order they appear. I used the genetic algorithm to create a travelling salesman solver. I did not have any duplicates in my genes. If anyone wants to try the same they should know it deletes duplicates.

Stef · Accepted Answer · 2024-02-15 15:33:04Z

1

Using np.apply_along_axis to test if each row is in the set of the rows of array2:

array1 = np.array([[ 0, 0], [86, 4], [75, 74], [78, 55], [53, 94], [49, 83], [99, 75], [99, 10], [32, 4], [55, 99], [62, 95], [ 0, 0]])
array2 = np.array([[55, 99], [32, 4], [75, 74]])

excludeset = set(map(repr, array2))
array3 = array1[np.apply_along_axis(lambda row: repr(row) not in excludeset, 1, array1)]

print(array3)
# [[0 0] [86  4] [78 55] [53 94] [49 83] [99 75] [99 10] [62 95] [0 0]]

edited Feb 15, 2024 at 15:33

answered Feb 9, 2024 at 23:31

Stef

15.7k2 gold badges22 silver badges39 bronze badges

Comments

Stef · Accepted Answer · 2024-02-15 16:07:08Z

If all your elements are guaranteed to be in range [0, 100) and your rows always have two elements, then you can convert each row [a,b] into the single number 100 * a + b to use numpy.isin:

import numpy as np

array1 = np.array([[ 0, 0], [86, 4], [75, 74], [78, 55], [53, 94], [49, 83], [99, 75], [99, 10], [32, 4], [55, 99], [62, 95], [ 0, 0]])
array2 = np.array([[55, 99], [32, 4], [75, 74]])

convertor = np.array([100, 1])

array3 = array1[~ np.isin(array1 @ convertor, array2 @ convertor)]

print(array3)
# [[0 0] [86  4] [78 55] [53 94] [49 83] [99 75] [99 10] [62 95] [0 0]]

If the number of elements per row is not 2, here is a more general formula for convertor:

k = array1.shape[1]
convertor = np.logspace(0, k-1, num=k, base=100, dtype=int)[::-1]

If you do not know that elements are guaranteed to be in range[0, 100), but you do know that all elements are nonnegative, then you can replace base=100 with base=array1.max()+1 in the formula above.

Collectives™ on Stack Overflow

Numpy filtering based on all row values

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related