Can I avoid Python loop overhead on dynamic programming with numpy?

Question

I need help with the Pythonic looping overhead of the following problem: I'm writing a function that calculates a pixel flow algorithm that's a classic dynamic programming algorithm on a 2D Numpy array. It requires:

1) visiting all the elements of the array at least once like this:

for x in range(xsize):
    for y in range(ysize):
         updateDistance(x,y)

2) potentially following a path of elements based on the values of the neighbors of an element which looks like this

while len(workingList) > 0:
   x,y = workingList.pop()
   #if any neighbors of x,y need calculation, push x,y and neighbors on workingList 
   #else, calculate flow on pixels as a sum of flow on neighboring pixels

Unfortunately, I seem to be getting a lot of Pythonic loop overhead on #1 even if the call to updateDistance is pass. I figure this is a classic enough algorithm that there must be a good approach to it in Python that can avoid some of the looping overhead. I'm also worried that if I can fix #1 I'm going to get busted on the loop in #2.

Any suggestions about quickly looping through elements in a 2D numpy array and potentially updating chains of elements in that array?

Edit: Flushing out more details of #2

It seems I might be able to vectorize the first loop, perhaps with vectorizing an np.meshgrid call.

The loop in part is a little complicated, but here's a simplified version. I'm concerned about both the loop and the indexing into the neighboring elements:

#A is a 2d cost matrix
workingList = [(x,y)]
while len(workingList) > 0:
   x,y = workingList.pop()
   neighborsToCalculate = []
   for n in neighborsThatNeedCalculation(x,y): #indexes A to check neighbors of (x,y)
      neighborsToCalculate.append(n)
   if len(neighborstToCalculate) != 0:
       workingList.append((x,y))
       workingList.extend(neighborsToCalculate)
   else:
       for xn,yn in neighbors(x,y):
          A[x,y] += 1+A[xn,yn]

This is a classic breadth first search problem. It would be great if this could be parallelized. It probably can't in its current form because it follows a path, but I'd be delighted for suggestions.

could you post more details on your second loop?

Simon Bergot
– Simon Bergot

2011-11-22 13:52:55 +00:00
Commented Nov 22, 2011 at 13:52 — Simon Bergot
– Simon Bergot, Commented Nov 22, 2011 at 13:52
@Simon Yes, I fleshed it out some more.

Rich
– Rich

2011-11-22 21:27:04 +00:00
Commented Nov 22, 2011 at 21:27 — Rich
– Rich, Commented Nov 22, 2011 at 21:27

Simon Bergot · Accepted Answer · 2011-11-23 12:46:45Z

you won't get any speed boost from numpy if you use python loops in your algorithm. You need to parallelize your problem.

In image processing, parallelization means using the same function on all pixels, i.e using kernels. In numpy, instead of doing:

for x in range(xsize):
    for y in range(ysize):
         img1[y, x] = img2[y, x] + img3[y, x]

you do:

img1 = img2 + img3 # add 2 images pixelwise

so that the loop happens in c. The fact that you have a list of neighbors with unknown length for each pixel make your problem difficult to parallelize in this way. You should either rework your problem (could you be a bit more specific about you algorithm?), or use another language, like cython.

edit:

you won't get benefit from Numpy without changing your algorithm. Numpy allows you to perform linear algebra operations. You can't avoid looping overhead with this library when performing arbitrary operations.

To optimize this, you may consider:

switching to another language like cython (which is specialized in python extensions) to get rid of looping costs
optimizing your algorithm: If you can get the same result using only linear algebra operations (this depend on the neighborsThatNeedCalculation function), you may use numpy, but you will need to work out a new architecture.
using parallelization techniques like MapReduce. With python you may use a pool of workers (available in the multiprocessing module), you will get more speed gains if you switch to another language, since python will have other bottlenecks.

In the case you want something easy to setup and to integrate, and you just need to have c-like performances, I strongly suggest cython if you can't rework your algorithm.

Thanks for taking the time on this. I ended up going with cython with some initial massive speedups.

David Alber · Accepted Answer · 2011-11-22 15:59:33Z

For the first part, you can use numpy.vectorize, but should only do so if there's no way to use array operations to implement the functionality of updateDistance. Here's an example:

import numpy as np    
updateDistance = np.vectorize(lambda x: x + 1) # my updateDistance increments

In reality, if this is the operation you are trying to do, just do a + 1. So if we take an array of ones and apply updateDistance:

>>> a = np.ones((3,3))
>>> updateDistance(a)
array([[ 2.,  2.,  2.],
       [ 2.,  2.,  2.],
       [ 2.,  2.,  2.]])

As for the second part, I do not think I understand the details well enough to suggest a better alternative. It sounds like you need to look at the nearest neighbors repeatedly, so I suspect you can improve things in the if-else, at least.

Update: Timings for the first part.

Note: these timings were done on my machine without attempting to normalize the environment.

Loop times are generated with:

python -mtimeit 'import numpy as np' 'n = 100' 'a = np.ones((n, n))' 'b = np.zeros((n, n))' 'for x in range(n): ' '    for y in range(n):' '        b[x,y] = a[x,y] + 1'

The np.vectorize times are generated with:

python -mtimeit 'import numpy as np' 'n = 100' 'a = np.ones((n, n))' 'updateDistance = np.vectorize(lambda x: x + 1)' 'b = updateDistance(a)'

In both cases, n = 100 leads to a 100 x 100 array. Replace 100 as needed.

Array size    Loop version    np.vectorize version    np.vectorize speed up
100 x 100     20.2 msec        2.6 msec               7.77x
200 x 200     81.8 msec       10.4 msec               7.87x
400 x 400     325 msec        42.6 msec               7.63x

Finally, to compare the np.vectorize example with simply using array operations, you can do:

python -mtimeit 'import numpy as np' 'n = 100' 'a = np.ones((n, n))' 'a += 1'

On my machine, this generated the following.

Array size    Array operation version    Speed up over np.vectorize version
100 x 100     23.6 usec                  110.2x
200 x 200     79.7 usec                  130.5x
400 x 400     286 usec                   149.0x

In summary, there an advantage in using np.vectorize, instead of loops, but there is a much bigger incentive to implement the functionality of updateDistance using array operations, if possible.

Thanks for this. I want to vectorize an operation across all the elements that may take many steps and update more than just the current cell. In the guise of your example, I'd need to know in the body of the lambda the current x,y coordinate under consideration and make updates directly to a. Is this reasonable use of numpy?

cyborg · Accepted Answer · 2011-11-23 13:19:02Z

1

You should consider using C-extension/Cython. If you stay with Python, one major improvement can be achieved by replacing:

for xn,yn in neighbors(x,y):
      A[x,y] += 1+A[xn,yn]

with:

n = neighbors(x,y)
A[x,y] += len(n)+sum(A[n])

neighbors should return indexes, not subscripts.

answered Nov 23, 2011 at 13:19

cyborg

10.2k4 gold badges40 silver badges56 bronze badges

Collectives™ on Stack Overflow

Can I avoid Python loop overhead on dynamic programming with numpy?

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related