4

I'm processing data by looping over vectors along an axis (could be any axis) of numpy ndarray (could be of any dimensions).

I didn't work on array directly because the data are not perfect. It requires quality control on each vector. If not good, the vector will be filled by zeros (or nan) and not have a real processing.

I found this Q similar but my problem is much more difficult because

  1. ndim is arbitrary.

For a 3D array, I can take vectors along axis 1 like this

 x = np.arange(24).reshape(2,3,4)
 for i in range(x.shape[0]):
     for k in range(x.shape[2]):
         process(x[i,:,k])

but if ndim and the taken axis are not fixed, how to take vectors?

  1. The axis for taking vectors is arbitrary.

One possible way I'm considering is

 y = x.swapaxes(ax,-1)
 # loop over vectors along last axis
 for i in np.ndindex(y.shape[:-1]):
     process(y[i+(slice(None),)])
 # then swap back
 z = y.swapaxes(ax,-1)

But I'm doubting the efficiency of this method.

7
  • The most efficient way would be to not iterate/loop, that is modifying (if not applicable yet) process to process all slices in one go. So, are you working with a specific process func? If not, look into np.vectorize I think. Commented Jul 5, 2016 at 15:44
  • I didn't work on array directly because the data array are not perfect. It requires quality control on each vector. If not good, the vector will be filled by zeros (or nan). Commented Jul 5, 2016 at 15:47
  • depending on the function process and the amount of looping, it might be worth it to use numba's jit decorator Commented Jul 5, 2016 at 16:55
  • Are you able to elaborate a little more on why you want to process the data in linear vectors rather than as a whole array Commented Jul 5, 2016 at 17:04
  • You don't need the swapaxes in that last example - see my edits. Commented Jul 5, 2016 at 23:13

2 Answers 2

3

The best way to test efficiency is to do time tests on realistic examples. But %timeit (ipython) tests on toy examples are a start.

Based on experience from answering similar 'if you must iterate' questions, there isn't much difference in times. np.frompyfunc has a modest speed edge - but its pyfunc takes scalars, not arrays or slices. (np.vectorize is a nicer API to this function, and a bit slower).

But here you want to pass a 1d slice of an array to your function, while iterating over all the other dimensions. I don't think there's much difference in the alternative iteration methods.

Actions like swapaxis, transpose and ravel are fast, often just creating a new view with different shape and strides.

np.ndindex uses np.nditer (with the multindex flat) to iterate over a range of dimensions. nditer is fast when used in C code, but isn't anything special when used in Python code.

np.apply_along_axis creates a (i,j,:,k) indexing tuple, and steps the variables. It's a nice general approach, but isn't doing anything special to speed things up. itertools.product is another way of generating the indices.

But usually it isn't the iteration mechanism that slows things down, it's the repeated call to your function. You can test the iteration mechanism by using a trivial function, e.g.

def foo(x):
   return x

===================

You don't need to swapaxes to use ndindex; you can use it to iterate on any combination of axes.

For example, make a 3d array, and sum along the middle dimension:

In [495]: x=np.arange(2*3*4).reshape(2,3,4)

In [496]: N=np.ndindex(2,4)

In [497]: [x[i,:,k].sum() for i,k in N]
Out[497]: [12, 15, 18, 21, 48, 51, 54, 57]

In [498]: x.sum(1)
Out[498]: 
array([[12, 15, 18, 21],
       [48, 51, 54, 57]])

I don't think it makes a difference in speed; the code's just simpler.

===================

Another possible tool is np.ma, masked arrays. With those you mark individual elements as masked (because they are nan or 0). It has code that evaluates things like sum, mean, product in such a way that the masked values don't harm the solution.

The 3d array again:

In [517]: x=np.arange(2*3*4).reshape(2,3,4)

add in some bad values:

In [518]: x[1,1,2]=99    
In [519]: x[0,0,:]=99

those values mess up the normal sum:

In [520]: x.sum(axis=1)
Out[520]: 
array([[111, 113, 115, 117],
       [ 48,  51, 135,  57]])

but if we mask them, they are 'filtered out' of the solution (in this case, they are set temporarily to 0)

In [521]: xm=np.ma.masked_greater(x,50)

In [522]: xm
Out[522]: 
masked_array(data =
 [[[-- -- -- --]
  [4 5 6 7]
  [8 9 10 11]]

 [[12 13 14 15]
  [16 17 -- 19]
  [20 21 22 23]]],
             mask =
 [[[ True  True  True  True]
 ...
  [False False False False]]],
       fill_value = 999999)

In [523]: xm.sum(1)
Out[523]: 
masked_array(data =
 [[12 14 16 18]
 [48 51 36 57]],
 ...)
Sign up to request clarification or add additional context in comments.

Comments

1

Have you considered numpy.nditer?

See also Iterating over arrays.

EDIT: maybe another solution would just be to either use:

  • flatten
  • ravel
  • the flat 1D iterator

You can thus iterate 1D-like whatever the array's initial dim, and then reshape the array to its original shape.

3 Comments

I got similar idea using np.ndindex from stackoverflow.com/questions/29493183/… but it also mentioned that nditer is even slower than for loop.
In fact, the builtin functions of numpy, as mean, sum, fft, ..., can always deal with ndarray regardless what ndim and axis. But I didn't get how they make it, even though I took a look at the source codes.
I think they actually wrap pre-compiled C code (in which loops do occur), see What is numpy. Have you considered the possibility to check your data using some of numpy's vectorized functions (see for example docs.scipy.org/doc/numpy/reference/routines.logic.html) and numpy indexing which is really powerful? If you definitely need to loop, maybe you should consider Cython and use it to write looping C code natively callable from Python

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.