3

So I'm trying to do some simple image analysis in python, I have a numpy array of the video in question and it has a shape of (930, 256, 256), i.e. 930 frames of a resolution of 256 by 256 pixels.

I'm trying to do seed pixel correlation in parallel, my computer has 12 cores, so I should be able to write a parallel for loop and get my results faster.

This is what I came up with after looking around for ways to write parallel for loops. However, it's significantly slower than the non parallel version!!

Perhaps someone can tell me a better way of writing it? (using other libraries!) Or maybe someone can tell me why it is slower?

Here's the code I came up with:

import numpy as np
from scipy.stats.stats import pearsonr
from joblib import Parallel, delayed  
import multiprocessing

def corr(pixel, seed_pixel):
    return pearsonr(pixel, seed_pixel)[0]

def get_correlation_map(seed_x, seed_y, frames):
    seed_pixel = np.asarray(frames[:, seed_x, seed_y], dtype=np.float32)

    # Reshape into time and space
    frames = np.reshape(frames, (total_number_of_frames, width*height))
    #correlation_map = []
    #####################################
    print 'Getting correlation...'

    # The parallel version.
    correlation_map = Parallel(n_jobs=12)(delayed(corr)(pixel, seed_pixel) for pixel in frames.T)

    # Non parallel version
    #correlation_map = []
    #for i in range(frames.shape[-1]):
        #correlation_map.append(pearsonr(frames[:, i], seed_pixel)[0])
    #####################################
    correlation_map = np.asarray(correlation_map, dtype=np.float32)
    correlation_map = np.reshape(correlation_map, (width, height))
    print np.shape(correlation_map)

    return correlation_map

All I need is a way to parallelize a for loop that will append its results to a list in the order of the iteration. So I suppose synchronization could be an issue!

4
  • Have you looked at your CPU utilization to try to find if the problem is IO bound in some way? Also what OS are you using? Commented Aug 2, 2015 at 18:49
  • I'm using Ubuntu! And strangely the CPUs aren't taxed at 100, neither of them. They say around the 50s. So problem could very well be IO... Any other suggestions? Commented Aug 2, 2015 at 19:08
  • The CPU's not being taxed is exactly what you are looking for. If you are using ubuntu and don't already have mpstat install sudo apt-get sysstat. Look closely at the iowait column. Commented Aug 2, 2015 at 19:15
  • IO wait column stayed %0.28 the whole way through a test run... Commented Aug 2, 2015 at 19:33

1 Answer 1

2

You are likely having an issue because the arguments passed to Parallel are large and all being serialized. You can use backend="threading" to avoid this if (as i assume) personr releases the GIL. Otherwise you might have to look into numpy.memmap and stick with using multiprocessor

correlation_map = Parallel(n_jobs=12, backend="threading")(delayed(corr)(pixel, seed_pixel) for pixel in frames.T)
Sign up to request clarification or add additional context in comments.

3 Comments

if pearsonr doesn't release the GIL, would that be causing the slowness? I tried what you suggested, but it's still slower than single core task :(
Yes, it could be IO bound. or in Amdahl's law (B ~ 1) en.wikipedia.org/wiki/Amdahl%27s_law having to share an IO resource.
Using this for numpy arrays with sklearn and made 100x speed up in the best case and 5x in worst case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.