9

I have a NumPy array:

arr = [[1, 2],
       [3, 4]]

I want to create a new array that contains powers of arr up to a power order:

# arr_new = [arr^0, arr^1, arr^2, arr^3,...arr^order]
arr_new = [[1, 1, 1, 2, 1, 4, 1, 8],
           [1, 1, 3, 4, 9, 16, 27, 64]]

My current approach uses for loops:

# Pre-allocate an array for powers
arr = np.array([[1, 2],[3,4]])
order = 3
rows, cols = arr.shape
arr_new = np.zeros((rows, (order+1) * cols))

# Iterate over each exponent
for i in range(order + 1):
    arr_new[:, (i * cols) : (i + 1) * cols] = arr**i
print(arr_new)

Is there a faster (i.e. vectorized) approach to creating powers of an array?


Benchmarking

Thanks to @hpaulj and @Divakar and @Paul Panzer for the answers. I benchmarked the loop-based and broadcasting-based operations on the following test arrays.

arr = np.array([[1, 2],
                [3,4]])
order = 3

arrLarge = np.random.randint(0, 10, (100, 100))  # 100 x 100 array
orderLarge = 10

The loop_based function is:

def loop_based(arr, order):
    # pre-allocate an array for powers
    rows, cols = arr.shape
    arr_new = np.zeros((rows, (order+1) * cols))
    # iterate over each exponent
    for i in range(order + 1):
        arr_new[:, (i * cols) : (i + 1) * cols] = arr**i
    return arr_new

The broadcast_based function using hstack is:

def broadcast_based_hstack(arr, order):
    # Create a 3D exponent array for a 2D input array to force broadcasting
    powers = np.arange(order + 1)[:, None, None]
    # Generate values (third axis contains array at various powers)
    exponentiated = arr ** powers
    # Reshape and return array
    return np.hstack(exponentiated)   # <== using hstack function

The broadcast_based function using reshape is:

def broadcast_based_reshape(arr, order):
    # Create a 3D exponent array for a 2D input array to force broadcasting
    powers = np.arange(order + 1)[:, None]
    # Generate values (3-rd axis contains array at various powers)
    exponentiated = arr[:, None] ** powers
    # reshape and return array
    return exponentiated.reshape(arr.shape[0], -1)  # <== using reshape function

The broadcast_based function using cumulative product cumprod and reshape:

def broadcast_cumprod_reshape(arr, order):
    rows, cols = arr.shape
    # Create 3D empty array where the middle dimension is
    # the array at powers 0 through order
    out = np.empty((rows, order + 1, cols), dtype=arr.dtype)
    out[:, 0, :] = 1   # 0th power is always 1
    a = np.broadcast_to(arr[:, None], (rows, order, cols))
    # Cumulatively multiply arrays so each multiplication produces the next order
    np.cumprod(a, axis=1, out=out[:,1:,:])
    return out.reshape(rows, -1)

On Jupyter notebook, I used the timeit command and got these results:

Small arrays (2x2):

%timeit -n 100000 loop_based(arr, order)
7.41 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit -n 100000 broadcast_based_hstack(arr, order)
10.1 µs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit -n 100000 broadcast_based_reshape(arr, order)
3.31 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit -n 100000 broadcast_cumprod_reshape(arr, order)
11 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Large arrays (100x100):

%timeit -n 1000 loop_based(arrLarge, orderLarge)
261 µs ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit -n 1000 broadcast_based_hstack(arrLarge, orderLarge)
225 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit -n 1000 broadcast_based_reshape(arrLarge, orderLarge)
223 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit -n 1000 broadcast_cumprod_reshape(arrLarge, orderLarge)
157 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Conclusions:

It seems that the broadcast based approach using reshape is faster for smaller arrays. However, for large arrays, the cumprod approach scales better and is faster.

5
  • That broadcast_based_reshape looks wrong. The order of extending arrays is different. Follow my post for the exact code. Commented May 19, 2018 at 20:11
  • I tried both ways with reshape- with and without extending the input array arr. It seems their run times are within the margin of error from each other. I understand why a higher dimensional array of exponents was made (np.arange(order+1)[:,None]) to force broadcasting. I do not know why you also used arr[:, None]. Commented May 19, 2018 at 20:24
  • Extending both arrays is needed to use reshape later on, the idea being to make the ordering as needed in the output. I was talking more about the correctness of the output and not the performance there. Commented May 19, 2018 at 20:52
  • Thanks - I realized my error in implementing your approach. Fixed (I think). Commented May 19, 2018 at 21:04
  • Yup, looks good now. Commented May 19, 2018 at 21:05

4 Answers 4

7

Extend arrays to higher dims and let broadcasting do its magic with some help from reshaping -

In [16]: arr = np.array([[1, 2],[3,4]])

In [17]: order = 3

In [18]: (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
Out[18]: 
array([[ 1,  1,  1,  2,  1,  4,  1,  8],
       [ 1,  1,  3,  4,  9, 16, 27, 64]])

Note that arr[:,None] is essentially arr[:,None,:], but we can skip the trailing ellipsis for brevity.

Timings on a bigger dataset -

In [40]: np.random.seed(0)
    ...: arr = np.random.randint(0,9,(100,100))
    ...: order = 10

# @hpaulj's soln with broadcasting and stacking
In [41]: %timeit np.hstack(arr **np.arange(order+1)[:,None,None])
1000 loops, best of 3: 734 µs per loop

In [42]: %timeit (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
1000 loops, best of 3: 401 µs per loop

That reshaping part is practically free and that's where we gain performance here alongwith the broadcasting part of course, as seen in the breakdown below -

In [52]: %timeit (arr[:,None]**np.arange(order+1)[:,None])
1000 loops, best of 3: 390 µs per loop

In [53]: %timeit (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
1000 loops, best of 3: 401 µs per loop
Sign up to request clarification or add additional context in comments.

Comments

7

Use broadcasting to generate the values, and reshape or rearrange the values as desired:

In [34]: arr **np.arange(4)[:,None,None]
Out[34]: 
array([[[ 1,  1],
        [ 1,  1]],

       [[ 1,  2],
        [ 3,  4]],

       [[ 1,  4],
        [ 9, 16]],

       [[ 1,  8],
        [27, 64]]])
In [35]: np.hstack(_)
Out[35]: 
array([[ 1,  1,  1,  2,  1,  4,  1,  8],
       [ 1,  1,  3,  4,  9, 16, 27, 64]])

Comments

6

Here is a solution using cumulative multiplication which scales better than power based approaches, especially if the input array is of float dtype:

import numpy as np

def f_mult(a, k):
    m, n = a.shape
    out = np.empty((m, k, n), dtype=a.dtype)
    out[:, 0, :] = 1
    a = np.broadcast_to(a[:, None], (m, k-1, n))
    a.cumprod(axis=1, out=out[:, 1:])
    return out.reshape(m, -1)

Timings:

int up to power 9
divakar: 0.4342731796205044 ms
hpaulj:  0.794165057130158 ms
pp:      0.20520629966631532 ms
float up to power 39
divakar: 29.056487752124667 ms
hpaulj:  31.773792404681444 ms
pp:      1.0329263447783887 ms

Code for timings, thks @Divakar:

def f_divakar(a, k):
    return (a[:,None]**np.arange(k)[:,None]).reshape(a.shape[0],-1)

def f_hpaulj(a, k):
    return np.hstack(a**np.arange(k)[:,None,None])


from timeit import timeit

np.random.seed(0)
a = np.random.randint(0,9,(100,100))
k = 10
print('int up to power 9')
print('divakar:', timeit(lambda: f_divakar(a, k), number=1000), 'ms')
print('hpaulj: ', timeit(lambda: f_hpaulj(a, k), number=1000), 'ms')
print('pp:     ', timeit(lambda: f_mult(a, k), number=1000), 'ms')

a = np.random.uniform(0.5,2.0,(100,100))
k = 40
print('float up to power 39')
print('divakar:', timeit(lambda: f_divakar(a, k), number=1000), 'ms')
print('hpaulj: ', timeit(lambda: f_hpaulj(a, k), number=1000), 'ms')
print('pp:     ', timeit(lambda: f_mult(a, k), number=1000), 'ms')

3 Comments

Hmm - I'm getting different results when I implement the cumprod approach. My implementation of your answer is added to my benchmarking edit.
@hazrmad, as I said it scales better, i.e. it will be faster as soon as the problem size is large enough. Your test example is so small that you are mostly measuring overheads.
Thanks! I ran tests with 100x100 arrays and indeed your approach was faster. I'll add a note.
2

You are creating a Vandermonde matrix with a reshape, so it is probably best to use numpy.vander to make it, and let someone else take care of the best algorithm.

This way your code is just:

np.vander(arr.ravel(), order).reshape((arr.shape[0], -1))

That said, it seems like they use something like Paul Panzer's cumprod method under the hood so it should scale well.

2 Comments

Thanks! I wonder how does one even go about looking for such a function without knowing exactly what to look for.
In my case, I came across it years ago learning about Reed Solomon coding. I forgot everything else about that stuff, but the Vandermonde matrix keeps popping up!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.