1

Using multiprocessing, I tried to parallelize a function but I have no performance improvement:

from MMTK import *
from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator
from MMTK.Proteins import Protein, PeptideChain
import numpy as np

filename = 'traj_prot_nojump.nc'

trajectory = Trajectory(None, filename)

def calpha_2dmap_mult(trajectory = trajectory, t = range(0,len(trajectory))):
    dist = []
    universe = trajectory.universe
    proteins = universe.objectList(Protein)
    chain = proteins[0][0]
    traj = trajectory[t]
    dt = 1000 # calculate distance every 1000 steps
    for n, step in enumerate(traj):
        if n % dt == 0:
            universe.setConfiguration(step['configuration'])
            for i in np.arange(len(chain)-1):
                for j in np.arange(len(chain)-1):
                    dist.append(universe.distance(chain[i].peptide.C_alpha,
                                                  chain[j].peptide.C_alpha))
    return(dist)

c0 = time.time()
dist1 = calpha_2dmap_mult(trajectory, range(0,11001))
c1 = time.time() - c0
print(c1)


# Multiprocessing
from multiprocessing import Pool, cpu_count

pool = Pool(processes=4)
c0 = time.time()
dist_pool = [pool.apply(calpha_2dmap_mult, args=(trajectory, t,)) for t in
             [range(0,2001), range(3000,5001), range(6000,8001),
              range(9000,11001)]]
c1 = time.time() - c0
print(c1)

The time spent to calculate the distances is the 'same' without (70.1s) or with multiprocessing (70.2s)! I was maybe not expecting an improvement of a factor 4 but I was at least expecting some improvements! Is someone knows what I did wrong?

2
  • Note: Because of the GIL, doing CPU heavy work in threads often doesn't work as expected in Python. wiki.python.org/moin/GlobalInterpreterLock Commented Oct 14, 2014 at 9:32
  • @AaronDigulla That is why "multiprocessing is a package that supports spawning processes" Commented Oct 14, 2014 at 21:14

1 Answer 1

4

Pool.apply is a blocking operation:

[Pool.apply is the] equivalent of the apply() built-in function. It blocks until the result is ready, so apply_async() is better suited for performing work in parallel ..

In this case Pool.map is likely more appropriate for collecting the results; the map itself blocks but the sequence elements / transformations are processed in parallel.


It addition to using partial application (or manual realization of such), also consider expanding the data itself. It's the same cat in a different skin.

data = ((trajectory, r) for r in [range(0,2001), ..])
result = pool.map(.., data)

This can in turn be expanded:

def apply_data(d):
    return calpha_2dmap_mult(*d)

result = pool.map(apply_data, data)

The function (or simple argument-expanded proxy of such of such) will need to be written to accept a single argument but all the data is now mapped as a single unit.

Sign up to request clarification or add additional context in comments.

7 Comments

I cannot use apply_async() because I want the the results in order. I thought Pool.apply would be equivalent to Pool.map but with the advantage to allow several arguments. How can I use Pool.map with several arguments?
@guillaume You can still get the results in order with apply_async. Just do: final_dist_pool = [r.get() for r in dist_pool] after the initial calls to apply_async. However, if you want to use map with multiple args instead, you can use functools.partial to enable passing multiple arguments. See here.
Thank you very much for your answers. You were right, apply_async works like a charm and gives good performance (~20s Vs. ~70s) and using [r.get() ...] also the results in order. @dano Thanks for the hint about functools.partial with map, it gives similar performance compared to apply_sync.
Note that you can't use a lambda function with Pool.map. You have to use a function defined at the top level of the module, or a functools.partial (which must consume a function declared at the top-level of the module).
@user2864740 The lambda function won't pickle/unpickle properly.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.