I have a function that I would like to be evaluated across multiple nodes in a cluster. I've gotten simple examples to run on our cluster using MPI4py, but was hoping to find a python package that makes things a little more user friendly (like implementing the map feature of multiprocessing) but also has a little more control over how many processes get spawned and on which of the nodes. I've seen a few packages that implement map but not any that control how many processes are spawned on each node.
The following code gets close to illustrating what I mean. However, instead of writing it in the typical way you would with MPI4py I've written it like you would with the map function. I wrote it this way because this is ultimately how I'd like to implement the code (with a module that emulates map) and because I'm not quite sure how'd I'd write it using MPI to achieve what I want to do.
from numpy import *
from multiprocessing import Pool
def foo(n):
random.seed(n)
a = random.randn(1000,1000)
b = random.randn(1000,1000)
c = dot(a, b)
return c.mean()
if __name__ == '__main__':
pool = Pool(processes=4)
results = pool.map(foo, range(4))
print results
The reason why I want to control the number of processes sent to each node is that some of the instructions inside of foo can be multithreaded (like dot which would also be linked to the MKL libraries).
If I have a cluster of 12 computers with 2 cores each, I'd like to just send out one job to each of the 12 nodes, where it would implicitly take advantage of both cores. I don't want to spawn 24 jobs (one for each core) because I'm worried about possible thread-thrashing when both processes try to use both cores. I also can't just spawn 12 processes because I can't be certain it will send one to each node and not 2 to the first 6 nodes.
First off, should this be a major concern? How much of an effect would running 24 processes instead of 12 have on performance?
If it will make a difference, is there a python package that will overlay on top of MPI4py and do what I'm looking for?