Parallel computing in python significantly slower than regular for loop

Question

So I'm trying to do some simple image analysis in python, I have a numpy array of the video in question and it has a shape of (930, 256, 256), i.e. 930 frames of a resolution of 256 by 256 pixels.

I'm trying to do seed pixel correlation in parallel, my computer has 12 cores, so I should be able to write a parallel for loop and get my results faster.

This is what I came up with after looking around for ways to write parallel for loops. However, it's significantly slower than the non parallel version!!

Perhaps someone can tell me a better way of writing it? (using other libraries!) Or maybe someone can tell me why it is slower?

Here's the code I came up with:

import numpy as np
from scipy.stats.stats import pearsonr
from joblib import Parallel, delayed  
import multiprocessing

def corr(pixel, seed_pixel):
    return pearsonr(pixel, seed_pixel)[0]

def get_correlation_map(seed_x, seed_y, frames):
    seed_pixel = np.asarray(frames[:, seed_x, seed_y], dtype=np.float32)

    # Reshape into time and space
    frames = np.reshape(frames, (total_number_of_frames, width*height))
    #correlation_map = []
    #####################################
    print 'Getting correlation...'

    # The parallel version.
    correlation_map = Parallel(n_jobs=12)(delayed(corr)(pixel, seed_pixel) for pixel in frames.T)

    # Non parallel version
    #correlation_map = []
    #for i in range(frames.shape[-1]):
        #correlation_map.append(pearsonr(frames[:, i], seed_pixel)[0])
    #####################################
    correlation_map = np.asarray(correlation_map, dtype=np.float32)
    correlation_map = np.reshape(correlation_map, (width, height))
    print np.shape(correlation_map)

    return correlation_map

All I need is a way to parallelize a for loop that will append its results to a list in the order of the iteration. So I suppose synchronization could be an issue!

Have you looked at your CPU utilization to try to find if the problem is IO bound in some way? Also what OS are you using? — Victory
– Victory, Commented Aug 2, 2015 at 18:49
I'm using Ubuntu! And strangely the CPUs aren't taxed at 100, neither of them. They say around the 50s. So problem could very well be IO... Any other suggestions? — ch0l1n3
– ch0l1n3, Commented Aug 2, 2015 at 19:08
The CPU's not being taxed is exactly what you are looking for. If you are using ubuntu and don't already have mpstat install sudo apt-get sysstat. Look closely at the iowait column. — Victory
– Victory, Commented Aug 2, 2015 at 19:15
IO wait column stayed %0.28 the whole way through a test run... — ch0l1n3
– ch0l1n3, Commented Aug 2, 2015 at 19:33

Victory · Accepted Answer · 2015-08-02 17:47:28Z

2

You are likely having an issue because the arguments passed to Parallel are large and all being serialized. You can use backend="threading" to avoid this if (as i assume) personr releases the GIL. Otherwise you might have to look into numpy.memmap and stick with using multiprocessor

correlation_map = Parallel(n_jobs=12, backend="threading")(delayed(corr)(pixel, seed_pixel) for pixel in frames.T)

answered Aug 2, 2015 at 17:47

Victory

5,9113 gold badges28 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ch0l1n3 Over a year ago

if pearsonr doesn't release the GIL, would that be causing the slowness? I tried what you suggested, but it's still slower than single core task :(

Victory Over a year ago

Yes, it could be IO bound. or in Amdahl's law (B ~ 1) en.wikipedia.org/wiki/Amdahl%27s_law having to share an IO resource.

Little Bobby Tables Over a year ago

Using this for numpy arrays with sklearn and made 100x speed up in the best case and 5x in worst case.

Collectives™ on Stack Overflow

Parallel computing in python significantly slower than regular for loop

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related