7

I have a data matrix a and I have list of indices stored in array idx. I would like to get 10-length data starting at each of the indices defined by idx . Right now I use a for loop to achieve this. But it is extremely slow as I have to do this data fetch about 1000 times in an iteration. Below is a minimum working example.

import numpy as np
a = np.random.random(1000)
idx = np.array([1, 5, 89, 54])

# I want "data" array to have np.array([a[1:11], a[5:15], a[89:99], a[54:64]])
# I use for loop below but it is slow
data = []

for id in idx:
    data.append(a[id:id+10])  
data = np.array(data)

Is there anyway to speed up this process? Thanks.

EDIT: My question is different from the question asked here. In the question, the size of the chunks is random in contrast to fixed chunk size in my question. Other differences exist. I do not have to use up the entire array a and an element can occur in more than one chunk. My question does not necessarily "split" the array.

3
  • I am assuming a is ones for the purpose of the question, is that right? Commented Dec 12, 2020 at 8:44
  • @Ivan haha yes. I have edited it to now have random values. Commented Dec 12, 2020 at 8:44
  • 3
    this is trickier, because there are overlapping sections! If you do np.split(a, idx) you will split the array on indices 1, 5 leaving you with [array of size 1, array of size 4, ... which is not the desired result. Commented Dec 12, 2020 at 8:58

1 Answer 1

7

(Thanks to suggestion from @MadPhysicist)

This should work:

a[idx.reshape(-1, 1) + np.arange(10)]

Output: Shape (L,10), where L is the length of idx

Notes:

  1. This does not check for index-out-of-bound situations. I suppose it's easy to first ensure that idx doesn't contain such values.

  2. Using np.take(a, idx.reshape(-1, 1) + np.arange(10), mode='wrap') is an alternative, that will handle out-of-bounds indices by wrapping them around a. Passing mode='clip' instead of mode='wrap' would clip the excessive indices to the last index of a. But then, np.take() would probably have a completely different perf. characteristic / scaling characteristic.

Sign up to request clarification or add additional context in comments.

4 Comments

I'm thinking that it might be faster to sort the index, especially for short arrays. +1 either way
Also, you really don't need to reshape and transpose. The output array is the shape of the index. idx.reshape(-1, 1) + np.arange(10) is sufficient
@MadPhysicist -- Thanks, edited with the simplification.
We can see edits in the edit history. No need to mark "edit" and "update" in questions and answers. It's a common misconception among beginning authors that people want to see anything besides your polished product.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.