1

Given an array consisting of non-negative numbers, I want to be able to find the index of the middle 0 for each of the consecutive 0's in the array, excluding 0's that are found at the start or the end of the array. A simple example: we have given array

np.array([0, 0, 0, 1, 2, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 5, 6, 0, 0, 0])

Then, I want to return indices 6 and 13 (for an even amount of consecutive 0's, I want either the floor or ceil).

Now, my current implementation is as follows:

def middle_zero(array):
    busy = False
    not_start_zero = False

    middle_zero_list = []

    for i in range(len(array)):
        if array[i] > 0:
            not_start_zero = True

        if array[i] == 0 and not_start_zero and not busy:
            start = i
            busy = True
        if busy and array[i] > 0:
            end = i
            middle_zero_list.append(int(np.mean([start, end])))
            busy = False

    return np.array(middle_zero_list)
middle_zero(np.array([0, 0, 0, 1, 2, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 5, 6, 0, 0, 0]))
>> array([6, 13])

Although it is working, I think that a much more clever solution is possible here. As I have to do this computation many times, I want it to be more efficient.

4
  • 2
    Seems like a great usecase for numba.njit. Have you tried that? Commented Jul 12, 2022 at 16:12
  • @MichaelSzczesny I forgot to mention that I only want to consider consecutive zeros that are not at the start and not at the end of the array. When those are left out of consideration, is there still a bug? Commented Jul 12, 2022 at 16:27
  • 1
    does it deal with negative numbers as required? eg middle_zero(np.array([0, 1, 2, 0, 0, 0, -3, -2, 0, 0, 0, 0, 0, 0, 5, 6, 0, 0])) gives array([8]) so it's not the middle of zeros, it's the middle of non positives Commented Jul 12, 2022 at 16:35
  • Good point @Nin17. I did not mention that I only consider arrays consisting of non-negative numbers. Commented Jul 12, 2022 at 16:46

3 Answers 3

3

Note this doesn't include solitary non-positives, ie middle_zero(np.array([1,0,2])) gives array([], dtype=float64)

If you want a numpy only solution, you could do this (which shows significant speed improvements if the input array is large though it's slower than @DominikStańczak answer using numba (actually it depends on input - see below) - you can also use numba with this though it doesn't speed it up much):

import numpy as np
def middle_zero(array):
    argwhere = np.argwhere(array > 0).ravel()

    diff = np.ediff1d(argwhere)
    result = argwhere[:-1]+diff//2 # Or argwhere[:-1]+np.ceil(diff/2) for consistency with OP

    return result[result != argwhere[:-1]] #Or result[result-1 != argwhere[:-1]] for consistency with OP

Output with given example:

array([ 6, 12])

Edit timings: with input as np.array([0, 0, 0, 1, 2, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 5, 6, 0, 0, 0])

25.1 µs ± 227 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) #OP
11.8 µs ± 30.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) #Nin17
1.18 µs ± 3.97 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) #Dominik

With input as np.random.default_rng().integers(0, 2, size=100000)

225 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) #OP
833 µs ± 3.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Nin17
481 µs ± 2.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Dominik

If you have long regions of consecutive non-positive numbers, this is actually the fastest, with an input of np.array([1,*[0]*1000,1]*1000):

539 ms ± 1.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) #OP
432 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Nin17
1.93 ms ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Dominik
Sign up to request clarification or add additional context in comments.

2 Comments

Awesome, nice to see a numpy solution, and that it scales better than numba, as expected! Can I ask why did you go with ediff1d vs diff here?
I used ediff1d instead of diff because diff isn't supported by numba so you can also njit this to make it slightly faster. I think it works the same with diff without numba.
2

Simply wrapping your function in numba.njit (replacing the np.mean with a direct computation) gets this a nice speedup:

[nav] In [8]: @numba.njit
         ...: def middle_zero2(array):
         ...:     busy = False
         ...:     not_start_zero = False
         ...:
         ...:     middle_zero_list = []
         ...:
         ...:     for i in range(len(array)):
         ...:         if array[i] > 0:
         ...:             not_start_zero = True
         ...:
         ...:         if array[i] == 0 and not_start_zero and not busy:
         ...:             start = i
         ...:             busy = True
         ...:         if busy and array[i] > 0:
         ...:             end = i
         ...:             middle_zero_list.append(int((start+end)/2))
         ...:             busy = False
         ...:
         ...:     return np.array(middle_zero_list)

[ins] In [10]: %timeit middle_zero(a)
24.3 µs ± 466 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# skipped a cache run

[ins] In [12]: %timeit middle_zero2(a)
763 ns ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

about a factor of 30 of a speedup on your small example test case.

1 Comment

Thanks for your help Dominik! Although this already results in a speedup, I think that there might be a more clever solution possible (e.g. leaving out the for loop). Do you also have any suggestions on that?
0

I don't know if this is faster, but should be metered. It should be faster on a compiled language, but python is not.

def middle_zero(array):
    busy = False
    not_start_zero = False
    start=None
    end=None

    middle_zero_list = []

    for i, value in enumerate(array):
        isZero = value == 0
        not_start_zero = [True, not_start_zero][isZero]

        start, busy = [[start, busy], [i, True]][isZero and not_start_zero and not busy]
        if busy and not isZero:
            end = i
            middle_zero_list.append(int(np.mean([start, end])))
            busy = False

    return np.array(middle_zero_list)

1 Comment

This is slower than OPs

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.