4

I have an array x of the form,

x = [[1,2,3,...,7,8,9],
[1,2,3,...,7,9,8],
...,
[9,8,7,...,3,1,2],
[9,8,7,...,3,2,1]]

I also have an array of non-allowed numbers for each column. I want to select all of the rows which only have allowed characters in each column. For instance, I might have that I want only rows which do not have any of [1,2,3] in the first column; I can do this by,

x[~np.in1d(x[:,0], [1,2,3])]

And for any single column, I can do this. But I'm looking to essentially do this for all columns at once, selecting only the rows for which every elemnt is an allowed number for its column. I can't seem to get x.any or x.all to do this well - how should I go about this?

EDIT: To clarify, the non-allowed numbers are different for each column. In actuality, I will have some array y,

y = [[1,4,...,7,8],
[2,5,...,9,4],
[3,6,...,8,6]]

Where I want rows from x for which column 1 cannot be in [1,2,3], column 2 cannot be in [4,5,6], and so on.

2 Answers 2

2

You can broadcast the comparison, then all to check:

x[(x != y[:,None,:]).all(axis=(0,-1))]

Break down:

# compare each element of `x` to each element of `y`
# mask.shape == (y.shape[0], x.shape[0], x.shape[1])
mask = (x != y[:,None,:])

# `all(0)` checks, for each element in `x`, it doesn't match any element in the same column of `y`
# `all(-1) checks along the rows of `x`
mask = mask.all(axis=(0,-1)

# slice
x[mask]

For example, consider:

x = np. array([[1, 2],
       [9, 8],
       [5, 6],
       [7, 8]])

y = np.array([[1, 4],
       [2, 5],
       [3, 7]])

Then mask = (x != y[:,None,:]).all(axis=(0,1)) gives

array([False,  True,  True,  True])
Sign up to request clarification or add additional context in comments.

2 Comments

Can you explain it a bit more? What is y here, and what exactly is this line doing?
Hmm, I'm not sure how to implement this for my use. I should have been more explicit - the disallowed numbers are different for each column - I might not want [1,2,3] in column 1, but then not want [4,5,8] in column 2. Will this still work for that?
1

It's recommended to use np.isin rather than np.in1d these days. This lets you (a) compare the entire array all at once, and (b) invert the mask more efficiently.

x[np.isin(x, [1, 2, 3], invert=True).all(1)]

np.isin preserves the shape of x, so you can then use .all across the columns. It also has an invert argument which allows you to do the equivalent of ~isin(x, [1, 2, 3]), but more efficiently.

This solution vectorizes a similar computation to what the other is suggesting much more efficiently (although it's still a linear search), and avoids creating the temporary arrays as well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.