2

Given an two arrays: an input array and a repeat array, I would like to receive an array which is repeated along a new dimension a specified amount of times for each row and padded until the ending.

to_repeat = np.array([1, 2, 3, 4, 5, 6])
repeats = np.array([1, 2, 2, 3, 3, 1])
# I want final array to look like the following:
#[[1, 0, 0],
# [2, 2, 0],
# [3, 3, 0],
# [4, 4, 4],
# [5, 5, 5],
# [6, 0, 0]]

The issue is that I'm operating with large datasets (10M or so) so a list comprehension is too slow - what is a fast way to achieve this?

1 Answer 1

4

Here's one with masking based on this idea -

m = repeats[:,None] > np.arange(repeats.max())
out = np.zeros(m.shape,dtype=to_repeat.dtype)
out[m] = np.repeat(to_repeat,repeats)

Sample output -

In [44]: out
Out[44]: 
array([[1, 0, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 4],
       [5, 5, 5],
       [6, 0, 0]])

Or with broadcasted-multiplication -

In [67]: m*to_repeat[:,None]
Out[67]: 
array([[1, 0, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 4],
       [5, 5, 5],
       [6, 0, 0]])

For large datasets/sizes, we can leverage multi-cores and be more efficient on memory with numexpr module on that broadcasting -

In [64]: import numexpr as ne

# Re-using mask `m` from previous method
In [65]: ne.evaluate('m*R',{'m':m,'R':to_repeat[:,None]})
Out[65]: 
array([[1, 0, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 4],
       [5, 5, 5],
       [6, 0, 0]])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.