3

I have a sound signal, imported as a numpy array and I want to cut it into chunks of numpy arrays. However, I want the chunks to contain only elements above a threshold. For example:

threshold = 3
signal = [1,2,6,7,8,1,1,2,5,6,7]

should output two arrays

vec1 = [6,7,8]
vec2 = [5,6,7]

Ok, the above are lists, but you get my point.

Here is what I tried so far, but this just kills my RAM

def slice_raw_audio(audio_signal, threshold=5000):

    signal_slice, chunks = [], []

    for idx in range(0, audio_signal.shape[0], 1000):
        while audio_signal[idx] > threshold:
            signal_slice.append(audio_signal[idx])
         chunks.append(signal_slice)
    return chunks
3
  • how do you define the size of the chunks? Commented Apr 6, 2017 at 15:06
  • from the first element larger than the threshold, to the last. Next chunk the same... Commented Apr 6, 2017 at 15:07
  • You can yield each slice instead of returning everything so when iterating not everything is in the memory, if that's your only problem. Also, you should convert the signal array to a regular list for iterating, numpy will only slow you down. Commented Apr 6, 2017 at 15:09

3 Answers 3

2

Here's one approach -

def split_above_threshold(signal, threshold):
    mask = np.concatenate(([False], signal > threshold, [False] ))
    idx = np.flatnonzero(mask[1:] != mask[:-1])
    return [signal[idx[i]:idx[i+1]] for i in range(0,len(idx),2)]

Sample run -

In [48]: threshold = 3
    ...: signal = np.array([1,1,7,1,2,6,7,8,1,1,2,5,6,7,2,8,7,2])
    ...: 

In [49]: split_above_threshold(signal, threshold)
Out[49]: [array([7]), array([6, 7, 8]), array([5, 6, 7]), array([8, 7])]

Runtime test

Other approaches -

# @Psidom's soln
def arange_diff(signal, threshold):
    above_th = signal > threshold
    index, values = np.arange(signal.size)[above_th], signal[above_th]
    return np.split(values, np.where(np.diff(index) > 1)[0]+1)

# @Kasramvd's soln   
def split_diff_step(signal, threshold):   
    return np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]

Timings -

In [67]: signal = np.random.randint(0,9,(100000))

In [68]: threshold = 3

# @Kasramvd's soln 
In [69]: %timeit split_diff_step(signal, threshold)
10 loops, best of 3: 39.8 ms per loop

# @Psidom's soln
In [70]: %timeit arange_diff(signal, threshold)
10 loops, best of 3: 20.5 ms per loop

In [71]: %timeit split_above_threshold(signal, threshold)
100 loops, best of 3: 8.22 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

something wrong, if I set the threshold to 2000 , the first element of the list is array([2008], dtype=int16). If I look at the sound sheet, it's clearly more than one element at the beginning that are above 2000.
@Qubix Maybe those later ones are not in sequence with that 2008 in the input array signal?
2

Here is a Numpythonic approach:

In [115]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)
Out[115]: [array([1, 2]), array([6, 7, 8]), array([1, 1, 2]), array([5, 6, 7])]

Note that this will give you all the lower and upper items which based on the logic of splitting (which is based on diff and continues items) they are always interleaves, which means that you can simply separate them by indexing:

In [121]: signal = np.array([1,2,6,7,8,1,1,2,5,6,7])

In [122]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[::2]
Out[122]: [array([1, 2]), array([1, 1, 2])]

In [123]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]
Out[123]: [array([6, 7, 8]), array([5, 6, 7])]

You can use the comparison of the first item of your list with the threshold in order to find out which one of the above slices would give you the upper items.

Generally you can use the following snippet to get the upper items:

np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[signal[0] < threshold::2]

4 Comments

was arriving at the same but you are fast!
@ColonelBeauvel My luck;). I should update the answer though.
Guess OP wants only the elements that are above threshold?
@Divakar Yes, that's why I mentioned I'm going to update the answer.
1

Here is one option:

above_th = signal > threshold
index, values = np.arange(signal.size)[above_th], signal[above_th]
np.split(values, np.where(np.diff(index) > 1)[0]+1)
# [array([6, 7, 8]), array([5, 6, 7])]

Wrap in a function:

def above_thresholds(signal, threshold):
    above_th = signal > threshold
    index, values = np.arange(signal.size)[above_th], signal[above_th]
    return np.split(values, np.where(np.diff(index) > 1)[0]+1)

above_thresholds(signal, threshold)
# [array([6, 7, 8]), array([5, 6, 7])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.