5

I have a 2d array, where each element is a fourier transform. I'd like to split transform 'logarithmically'. For example, let's take a single one of those arrays and call it a:

a = np.arange(0, 512)

# I want to split a into 'bins' defined by b, below:
b = np.array([0] + [10 * 2**i for i in range(6)]) # [0, 10, 20, 40, 80, 160, 320, 640]

What I'm looking to do is something like using np.split, except I would like to split values into 'bins' based on array b such that all values of a between [0, 10) are in one bin, all values between [10, 20) in another, etc.

I could do this in some sort of convoluted for loop:

split_arr = []
for i in range(1, len(b)):
    fbin = []
    for amp in a:
        if (amp >= b[i-1]) and (amp < b[i]):
            fbin.append(amp)
    split_arr.append(fbin)

I have many arrays to split, and also this is ugly (just my opinion). Is there a better way?

2 Answers 2

5

Here is how you can do it, using np.split:

np.split(a, np.searchsorted(a,b))

If your array a is not sorted, sort it before the above command:

a = np.sort(a)

np.searchsorted finds the locations of values in b that would be inserted in the sorted array a. In other words, np.searchsorted finds the locations where you want to split your array. And if you do not want the empty array at the beginning, simply remove 0 from b.

Sign up to request clarification or add additional context in comments.

4 Comments

Just timed this. Blazing fast and so concise. I'm glad I waited a few minutes to check out answers. I'm looking up np.searchsorted in the docs now, as I'd like to understand it more. Thank you.
@rocksNwaves You are welcome. I added another line for more explanation. Hope it helps. Feel free to accept the answer if it solves your problem. Thank you.
So this probably assumes a is sorted in the first place... which is why it's so fast. If a is not sorted then you need to factor in the cost of sorting it. Still probably the most efficient method, especially for big arrays.
@Julien Yes, Thank you for the note. Added to the post.
1

First you can reduce the 'ugliness' by using list comprehension:

split_arr = [[amp for amp in a if (amp >= b[i-1]) and (amp < b[i])] for i in range(1, len(b))]

Then you can apply the same logic using numpy fast parallelized functionalities (which has the bonus of looking even cleaner):

split_arr = [a[(a >= b[i-1]) & (a < b[i])] for i in range(1, len(b))]

Comparison:

%timeit [[amp for amp in a if (amp >= b[i-1]) and (amp < b[i])] for i in range(1, len(b))]
1.29 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit [a[(a >= b[i-1]) & (a < b[i])] for i in range(1, len(b))]
35.9 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

1 Comment

I would really love to know the reason of the downvote...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.