4

I am searching a pythonic way to extract multiple subarrays from a given array using a mask as shown in the example:

a = np.array([10, 5, 3, 2, 1])
m = np.array([True, True, False, True, True])

The output will be a collection of array like the following, where only the contiguous "region" of True values (True values next to each other) of the mask m represent the indices generating a subarray.

L[0] = np.array([10, 5])
L[1] = np.array([2, 1])
1

3 Answers 3

3

Here's one approach -

def separate_regions(a, m):
    m0 = np.concatenate(( [False], m, [False] ))
    idx = np.flatnonzero(m0[1:] != m0[:-1])
    return [a[idx[i]:idx[i+1]] for i in range(0,len(idx),2)]

Sample run -

In [41]: a = np.array([10, 5, 3, 2, 1])
    ...: m = np.array([True, True, False, True, True])
    ...: 

In [42]: separate_regions(a, m)
Out[42]: [array([10,  5]), array([2, 1])]

Runtime test

Other approach(es) -

# @kazemakase's soln
def zip_split(a, m):
    d = np.diff(m)
    cuts = np.flatnonzero(d) + 1

    asplit = np.split(a, cuts)
    msplit = np.split(m, cuts)

    L = [aseg for aseg, mseg in zip(asplit, msplit) if np.all(mseg)]
    return L

Timings -

In [49]: a = np.random.randint(0,9,(100000))

In [50]: m = np.random.rand(100000)>0.2

# @kazemakase's's solution
In [51]: %timeit zip_split(a,m)
10 loops, best of 3: 114 ms per loop

# @Daniel Forsman's solution
In [52]: %timeit splitByBool(a,m)
10 loops, best of 3: 25.1 ms per loop

# Proposed in this post
In [53]: %timeit separate_regions(a, m)
100 loops, best of 3: 5.01 ms per loop

Increasing the average length of islands -

In [58]: a = np.random.randint(0,9,(100000))

In [59]: m = np.random.rand(100000)>0.1

In [60]: %timeit zip_split(a,m)
10 loops, best of 3: 64.3 ms per loop

In [61]: %timeit splitByBool(a,m)
100 loops, best of 3: 14 ms per loop

In [62]: %timeit separate_regions(a, m)
100 loops, best of 3: 2.85 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

I accept this answer because it provides the comparisons with the other methods discussed, furthermore a faster method. Thank you!
Fun fact: I just found out that np.r_[False, m, False] is 5-10x slower than np.concatenate(([False], m, [False])).
2
def splitByBool(a, m):
    if m[0]:
        return np.split(a, np.nonzero(np.diff(m))[0] + 1)[::2]
    else:
        return np.split(a, np.nonzero(np.diff(m))[0] + 1)[1::2] 

This will return a list of arrays, split into chunks of True in m

4 Comments

Nice solution. Makes use of the fact that True and False segments are necessarily alternating.
I like this solution because it can be turned into a one-liner: np.split(a, np.nonzero(np.diff(m))[0] + 1)[1 - m[0]::2]
Or even np.split(a, np.flatnonzero(np.diff(m)) + 1)[1 - m[0]::2)], which is a bit more readable
Interesting, each one liner is slower than the last :) But easier to read, as you said.
1

Sounds like a natural application for np.split.

You first have to figure out where to cut the array, which is where the mask changes between True and False. Next discard all elements where the mask is False.

a = np.array([10, 5, 3, 2, 1])
m = np.array([True, True, False, True, True])

d = np.diff(m)
cuts = np.flatnonzero(d) + 1

asplit = np.split(a, cuts)
msplit = np.split(m, cuts)

L = [aseg for aseg, mseg in zip(asplit, msplit) if np.all(mseg)]

print(L[0])  # [10  5]
print(L[1])  # [2 1]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.