Your function, with an added print to see exactly what k is:
In [39]: def my_func(k):
...: print(k)
...: x = np.arange(4) # range to match your expected result
...: y = x ** k
...: return y
...:
As written the function works with your (3,1) array, arr = np.arange(1,4)[:,None]:
In [40]: my_func(arr)
[[1]
[2]
[3]]
Out[40]:
array([[ 0, 1, 2, 3],
[ 0, 1, 4, 9],
[ 0, 1, 8, 27]])
Note the whole 2d array is passed. The x**k step works by broadcasting, using a (4,) array with a (3,1), to produce a (3,4) result. You should, if possible write functions that work like this, taking full advantage of the numpy methods and operators.
apply... can be used as here:
In [41]: np.apply_along_axis(my_func, 1, arr)
[1]
[2]
[3]
Out[41]:
array([[ 0, 1, 2, 3],
[ 0, 1, 4, 9],
[ 0, 1, 8, 27]])
Note that it passes (1,) arrays to the function. The docs should make it clear that this is designed to pass a 1d array to the function, NOT a scalar.
The equivalent for a 2d arr array is:
In [42]: np.array([my_func(i) for i in arr])
[1]
[2]
[3]
Out[42]:
array([[ 0, 1, 2, 3],
[ 0, 1, 4, 9],
[ 0, 1, 8, 27]])
Now lets comment out the print and do some time tests:
In [44]: timeit my_func(arr)
7.41 µs ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [45]: timeit np.apply_along_axis(my_func, 1, arr)
89.2 µs ± 649 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [46]: timeit np.array([my_func(i) for i in arr])
28.9 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
The broadcasted approach is fastest. apply_along_axis is slowest.
I claim that apply_along_axis is only useful when the array dimensions are greater than 2, and even then it just makes the code prettier, not faster.
For example with a 3d array, that still broadcasts with the (4,) shape x:
In [47]: arr = np.arange(24).reshape(2,3,4)
In [49]: np.apply_along_axis(my_func, 2, arr).shape
Out[49]: (2, 3, 4)
In [50]: my_func(arr).shape
Out[50]: (2, 3, 4)
In [51]: np.array([[my_func(arr[i,j,:]) for j in range(3)] for i in range(2)]).shape
Out[51]: (2, 3, 4)
The list iteration requires a double loop. apply_along_axis hides this, but does not reduce the total number of calls to my_func.
If your function really required a scalar (e.g. use a math.cos or if test), then you might consider np.vectorize. For smallist examples it's slower than the equivalent list comprehension, but it does scale better for large ones. But again, if you can write the function to work directly with array, you'll much happier with the performance.
my_func?k ** xnot working ?xvalue ->x = np.arange(4)