8

I have a numpy array:

a = [3., 0., 4., 2., 0., 0., 0.]

I would like a new array, created from this, where the non zero elements are converted to their value in zeros and zero elements are converted to a single number equal to the number of consecutive zeros i.e:

b = [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 3.]

Looking for a vectorized way to do this as the array will have > 1 million elements. Any help much appreciated.

1
  • I am going to be surprised if this can be vectorized but good luck :) Commented Oct 17, 2013 at 0:40

2 Answers 2

8

This should do the trick, it roughly works by 1) finding all the consecutive zeros and counting them, 2) computing the size of the output array and initializing it with zeros, 3) placing the counts from part 1 in the correct places.

def cz(a):
    a = np.asarray(a, int)

    # Find where sequences of zeros start and end
    wz = np.zeros(len(a) + 2, dtype=bool)
    wz[1:-1] = a == 0
    change = wz[1:] != wz[:-1]
    edges = np.where(change)[0]
    # Take the difference to get the number of zeros in each sequence
    consecutive_zeros = edges[1::2] - edges[::2]

    # Figure out where to put consecutive_zeros
    idx = a.cumsum()
    n = idx[-1] if len(idx) > 0 else 0
    idx = idx[edges[::2]]
    idx += np.arange(len(idx))

    # Create output array and populate with values for consecutive_zeros
    out = np.zeros(len(consecutive_zeros) + n)
    out[idx] = consecutive_zeros
    return out
Sign up to request clarification or add additional context in comments.

1 Comment

Excellent, works very well. Orders of magnitudes faster than the loops I was attempting to use.
4

For some variety:

a = np.array([3., 0., 4., 2., 0., 0., 0.],dtype=np.int)

inds = np.cumsum(a)

#Find first occurrences and values thereof.
uvals,zero_pos = np.unique(inds,return_index=True)
zero_pos = np.hstack((zero_pos,a.shape[0]))+1

#Gets zero lengths
values =  np.diff(zero_pos)-1
mask = (uvals!=0)

#Ignore where we have consecutive values
zero_inds = uvals[mask]
zero_inds += np.arange(zero_inds.shape[0])

#Create output array and apply zero values
out = np.zeros(inds[-1] + zero_inds.shape[0])
out[zero_inds] = values[mask]

out
[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  3.]

Mainly varies in the fact that we can use np.unique to find first occurrences of an array as long as it is monotonically increasing.

3 Comments

Nice answer. I think it's a little off if a has a leading 0, but that should be easy to fix.
Both are nice, @BiRico's is still a bit faster on my machine.
@BiRico Good point, it is lacking that aspect. Interestingly you need large values in your a array (a>300) before this method becomes faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.