Removing all but last non-zero sequence from numpy array

Question

The problem

I have a 1-dimensional numpy array filled mostly with zeros but also containing some groups of non-zero values.

>> import numpy as np
>> a = np.zeros(10)
>> a[2:4] = 2
>> a[6:9] = 3
>> print a
[ 0.  0.  2.  2.  0.  0.  3.  3.  3.  0.]

I want to get the array that contains only the last non-zero group. In other words, all but the last non-zero group should be replaced by zeros. (The groups could be only 1 element long). Like so:

[ 0.  0.  0.  0.  0.  0.  3.  3.  3.  0.]

Non-robust solution

This seems to do the trick. Reverse the array and find the first index where the change between elements is negative. Then replace all subsequent elements with zero. Then flip back. It's a bit long-winded:

>> b = a[::-1]
>> b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
>> c = b[::-1]
>> print c
[ 0.  0.  0.  0.  0.  0.  3.  3.  3.  0.]

Fails for a specific case

However, it is not robust and fails in the following case (because the where command returns an empty list of indices):

>> a = np.zeros(10)
>> a[0:4] = 2
>> print a
[ 2.  2.  2.  2.  0.  0.  0.  0.  0.  0.]
>> b = a[::-1]
>> b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
>> c = b[::-1]
>> print c

Traceback (most recent call last):

  File "<ipython-input-81-8cba57558ba8>", line 1, in <module>
    runfile('C:/Users/name/test1.py', wdir='C:/Users/name')

  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/name/test1.py", line 21, in <module>
    b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0

IndexError: index 0 is out of bounds for axis 0 with size 0

Fix

So I need to introduce an if clause:

>> b = a[::-1]
>> if len(np.where(np.ediff1d(b) < 0)[0]) > 0:
>>     b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
>> c = b[::-1]
>> print c
[ 2.  2.  2.  2.  0.  0.  0.  0.  0.  0.]

Is there a more elegant way to do it?

UPDATE Following on from Divakar's excellent answer and mtrw's question, I would like to extend the specification. The method should also work if the input array has non-zero values that are negative and for groups of non-zero numbers that change within the grouping.

e.g. np.array([1, 0, 0, 4, 5, 4, 5, 0, 0])

This means methods where we check for a positive or negative difference between elements, in order to find the group boundaries, would not work so well.

Are the non-zero values also guaranteed to be greater than 0? — mtrw
– mtrw, Commented Mar 10, 2017 at 13:46

Divakar · Accepted Answer · 2017-03-10 13:57:50Z

2

Approach #1

Since we are after elegance, let's feed ourselves a one-liner -

a[:(a[1:] > a[:-1]).cumsum().argmax()] = 0

Sample run -

In [605]: a
Out[605]: array([ 0.,  0.,  2.,  2.,  0.,  0.,  3.,  3.,  3.,  0.])

In [606]: a[:(a[1:] > a[:-1]).cumsum().argmax()] = 0

In [607]: a
Out[607]: array([ 0.,  0.,  0.,  0.,  0.,  0.,  3.,  3.,  3.,  0.])

Approach #2

Above approach assumes that the last group numbers are greater than 0's. If that's not the case and for cases where the non-zeros group might have different numbers, let's feed one more line to have a generic solution -

mask = a != 0
a[:(mask[1:] > mask[:-1]).cumsum().argmax()] = 0

Sample run -

In [667]: a
Out[667]: array([-1,  0,  0, -4, -5,  4, -5,  0,  0])

In [668]: mask = a != 0

In [669]: a[:(mask[1:] > mask[:-1]).cumsum().argmax()] = 0

In [670]: a
Out[670]: array([ 0,  0,  0, -4, -5,  4, -5,  0,  0])

edited Mar 10, 2017 at 13:57

answered Mar 10, 2017 at 13:37

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

feedMe Over a year ago

Thanks that's great. I think it's too late to modify my question but I should have asked for arrays where the grouped numbers are not constant, like a = np.array([1, 0, 0, 4, 5, 4, 5, 0, 0]). Can your method be adapted for this case, ignoring the fact that the differences between neighbouring digits within a group can also be positive?

Divakar Over a year ago

@feedMe In fact look at the second approach just added?

Collectives™ on Stack Overflow

Removing all but last non-zero sequence from numpy array

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related