Sampling a fixed length sequence from a numpy array

Question

I have a data matrix a and I have list of indices stored in array idx. I would like to get 10-length data starting at each of the indices defined by idx . Right now I use a for loop to achieve this. But it is extremely slow as I have to do this data fetch about 1000 times in an iteration. Below is a minimum working example.

import numpy as np
a = np.random.random(1000)
idx = np.array([1, 5, 89, 54])

# I want "data" array to have np.array([a[1:11], a[5:15], a[89:99], a[54:64]])
# I use for loop below but it is slow
data = []

for id in idx:
    data.append(a[id:id+10])  
data = np.array(data)

Is there anyway to speed up this process? Thanks.

EDIT: My question is different from the question asked here. In the question, the size of the chunks is random in contrast to fixed chunk size in my question. Other differences exist. I do not have to use up the entire array a and an element can occur in more than one chunk. My question does not necessarily "split" the array.

I am assuming a is ones for the purpose of the question, is that right? — Ivan
– Ivan, Commented Dec 12, 2020 at 8:44
this is trickier, because there are overlapping sections! If you do np.split(a, idx) you will split the array on indices 1, 5 leaving you with [array of size 1, array of size 4, ... which is not the desired result. — Ivan
– Ivan, Commented Dec 12, 2020 at 8:58

fountainhead · Accepted Answer · 2021-01-31 08:16:07Z

7

(Thanks to suggestion from @MadPhysicist)

This should work:

a[idx.reshape(-1, 1) + np.arange(10)]

Output: Shape (L,10), where L is the length of idx

Notes:

This does not check for index-out-of-bound situations. I suppose it's easy to first ensure that idx doesn't contain such values.
Using np.take(a, idx.reshape(-1, 1) + np.arange(10), mode='wrap') is an alternative, that will handle out-of-bounds indices by wrapping them around a. Passing mode='clip' instead of mode='wrap' would clip the excessive indices to the last index of a. But then, np.take() would probably have a completely different perf. characteristic / scaling characteristic.

edited Jan 31, 2021 at 8:16

answered Dec 12, 2020 at 9:30

fountainhead

3,7421 gold badge11 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mad Physicist Over a year ago

I'm thinking that it might be faster to sort the index, especially for short arrays. +1 either way

Mad Physicist Over a year ago

Also, you really don't need to reshape and transpose. The output array is the shape of the index. idx.reshape(-1, 1) + np.arange(10) is sufficient

fountainhead Over a year ago

@MadPhysicist -- Thanks, edited with the simplification.

Mad Physicist Over a year ago

We can see edits in the edit history. No need to mark "edit" and "update" in questions and answers. It's a common misconception among beginning authors that people want to see anything besides your polished product.

Collectives™ on Stack Overflow

Sampling a fixed length sequence from a numpy array

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related