Numpy: apply function that creates an array

Question

Numpy apply_along_axis/apply_over_axes assume that the applied function returns a scalar, but what if I want to use a function that returns an array (thus adding new dimensions)?

Below is a simplified example. I want to apply my_func to each row of an array. I could do this in pandas but expect numpy to be faster. Function:

def my_func(k):
    x = np.arange(3)
    y = x ** k
    return y

Original array:

array([[1],
       [2],
       [3]])

Expected result:

array([[ 0,  1,  2,  3],
       [ 0,  1,  4,  9],
       [ 0,  1,  8, 27]], dtype=int32)

Update: it was an oversimplified example. I should have said the real function can only take a scalar as input. But the solution proposed by Michael Szczesny in comments works for such functions too.

Update2: I should have said a function that does not broadcast, like this:

def my_func(k):
    return np.random.randint(1, 4, 5) + k

Arihant, yes, I want to apply my_func to the original array, i.e. I take its argument k from the original array. — Denis Kazakov
– Denis Kazakov, Commented Jul 23, 2022 at 17:07
Psidom, it is a simplified example just to see if it is possible in principle. I have a more complex function. — Denis Kazakov
– Denis Kazakov, Commented Jul 23, 2022 at 17:08
if you can pass the entire array, won't that solve your problem? and try increasing the x value -> x = np.arange(4) — Arihant
– Arihant, Commented Jul 23, 2022 at 17:14

Arihant · Accepted Answer · 2022-07-23 17:18:19Z

2

I am sharing the code for your reference,

import numpy as np
def my_func(k):
    x = np.arange(4)
    y = x ** k
    return y
inp = np.array([[1],[2],[3]])
print(my_func(inp))

Output:

[[ 0  1  2  3]
 [ 0  1  4  9]
 [ 0  1  8 27]]

See if it helps?

answered Jul 23, 2022 at 17:18

Arihant

7455 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Denis Kazakov Over a year ago

This is a very neat solution for functions that can take arrays as input.

hpaulj · Accepted Answer · 2022-07-23 20:30:43Z

Your function, with an added print to see exactly what k is:

In [39]: def my_func(k):
    ...:     print(k)
    ...:     x = np.arange(4)     # range to match your expected result
    ...:     y = x ** k
    ...:     return y
    ...:

As written the function works with your (3,1) array, arr = np.arange(1,4)[:,None]:

In [40]: my_func(arr)
[[1]
 [2]
 [3]]
Out[40]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  4,  9],
       [ 0,  1,  8, 27]])

Note the whole 2d array is passed. The x**k step works by broadcasting, using a (4,) array with a (3,1), to produce a (3,4) result. You should, if possible write functions that work like this, taking full advantage of the numpy methods and operators.

apply... can be used as here:

In [41]: np.apply_along_axis(my_func, 1, arr)
[1]
[2]
[3]
Out[41]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  4,  9],
       [ 0,  1,  8, 27]])

Note that it passes (1,) arrays to the function. The docs should make it clear that this is designed to pass a 1d array to the function, NOT a scalar.

The equivalent for a 2d arr array is:

In [42]: np.array([my_func(i) for i in arr])
[1]
[2]
[3]
Out[42]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  4,  9],
       [ 0,  1,  8, 27]])

Now lets comment out the print and do some time tests:

In [44]: timeit my_func(arr)
7.41 µs ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [45]: timeit np.apply_along_axis(my_func, 1, arr)
89.2 µs ± 649 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [46]: timeit np.array([my_func(i) for i in arr])
28.9 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The broadcasted approach is fastest. apply_along_axis is slowest.

I claim that apply_along_axis is only useful when the array dimensions are greater than 2, and even then it just makes the code prettier, not faster.

For example with a 3d array, that still broadcasts with the (4,) shape x:

In [47]: arr = np.arange(24).reshape(2,3,4)
In [49]: np.apply_along_axis(my_func, 2, arr).shape
Out[49]: (2, 3, 4)
In [50]: my_func(arr).shape
Out[50]: (2, 3, 4)
In [51]: np.array([[my_func(arr[i,j,:]) for j in range(3)] for i in range(2)]).shape
Out[51]: (2, 3, 4)

The list iteration requires a double loop. apply_along_axis hides this, but does not reduce the total number of calls to my_func.

If your function really required a scalar (e.g. use a math.cos or if test), then you might consider np.vectorize. For smallist examples it's slower than the equivalent list comprehension, but it does scale better for large ones. But again, if you can write the function to work directly with array, you'll much happier with the performance.

I won't say much so that don't say something wrong again, but thank you for clarifications! 1d arrays longer than 1 element will be my next step.

Collectives™ on Stack Overflow

Numpy: apply function that creates an array

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related