1

I have an array, for example:

array = np.array([[0,1,0,0,4,0,5,0,0],
[1,1,1,0,0,0,0,2,2],
[1,1,0,0,3,0,0,2,2],
[0,0,0,0,3,0,0,0,0],
[6,6,0,0,0,0,7,7,7]])

I also have a list, for example:

list = [0, 1, 2, 3, 7]

I want to remove (set to zero) all values in the array that do not appear in the list. For example:

newarray = [[0 1 0 0 0 0 0 0 0] 
            [1 1 1 0 0 0 0 2 2]
            [1 1 0 0 3 0 0 2 2]
            [0 0 0 0 3 0 0 0 0]
            [0 0 0 0 0 0 7 7 7]

Here, the 4s, 5s, and 6s in the array have been replaced with 0s because they didn't appear in the list. My current solution is pretty slow using np.where() in a loop to remove all values in the array that don't appear in the list:

# get all unique values in array
unique_vals_in_array = np.unique(array)

# get all values in array that don't appear in list
vals_not_in_array = set(unique_vals_in_array) - set(list)

# On each loop, replace the values that do not appear in list with zero
for i in vals_not_in_array:
  new_array = np.where(array==i,0,array)

If anyone has a more efficient, pythonic solution I'd appreciate it.

2 Answers 2

5

Why not just iterate over the list and add the values back in to a result array?

result = np.zeros_like(array)

for to_keep in [0, 1, 2, 3, 7]:
    result[array==to_keep] = to_keep
Sign up to request clarification or add additional context in comments.

Comments

3

The numpy way to do this seems to be to use isin:

to_keep = [0, 1, 2, 3, 7]
result = np.where(np.isin(array, to_keep, invert=True), 0, array)

Here, invert returns the logical not of the result, so the statement says if the element in array is not in to_keep then it is replaced with 0. This will probably be a faster method than using a non-numpy solution.

5 Comments

Proper testing should probably consider the behaviour with varying sizes of input array and to-keep values.
@KarlKnechtel Would you recommend I remove my comments on speed? Or even just everything after timeit? I think I'll leave up the answer with the "numpy way" of doing this, but I believe your answer is better.
Following @KarlKnechtel comment, I tested the solutions for my actual data with the following results: list length: 123 array shape: (1, 602, 3600) Original solution took 1.6121721267700195s np.isin solution took 0.10941171646118164s Karl's solution took 0.13035941123962402s
@jruss I'm glad the numpy solution worked out well!
I expected the np.isin approach to be fastest in the long run; I'm not especially experienced with Numpy and didn't feel like reasoning out anything more sophisticated than my own solution. It's think it's best that the timing results are made available in an answer, and I also think it's possible that we've both overlooked something.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.