2

I have sets of simulations that each uses MPI to run in parallel (they are CFD simulations). However, I would like to create a pool of tasks in Python and run that in parallel. I used the multiprocessing library as follows:

import itertools
import multiprocessing

def Run_Cases(params):
    run_each_mpi_simulation

a = range(1,len(U)); b = range(len(R))
paramlist = list(itertools.product(a,b))
pool = multiprocessing.Pool()
pool.map(Run_Case,paramlist)

so basically, the code creates the pool of tasks (simulation instances) and assign them to each processors to run. However, it does not take into account that each simulation requires lets say 2 processors as each case is a parallel (MPI) simulation. This result in significant performance drop in the simulations.

Hereby, I was wondering if it is possible to somehow define the number of processors that get assigned to each task in Python multiprocessing package?

Any comments is highly appreciated.

Kind regards Ashkan

EDIT/UPDATE:

Thank you so much @AntiMatterDynamite for your answer.

I tried your approach and the performance and work load distribution on the processors improved a lot but it seems there are 2 issues:

1) I get the following message although everything continues

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 1073, in run
    self.function(*self.args, **self.kwargs)
  File "/usr/local/lib/python2.7/dist-packages/PyFoam/Execution/FoamThread.py", line 86, in getLinuxMem
    me=psutil.Process(thrd.threadPid)
  File "/usr/local/lib/python2.7/dist-packages/psutil/__init__.py", line 341, in __init__
    self._init(pid)
  File "/usr/local/lib/python2.7/dist-packages/psutil/__init__.py", line 381, in _init
    raise NoSuchProcess(pid, None, msg)
NoSuchProcess: psutil.NoSuchProcess no process found with pid 8880

I would highly appreciate your comments.

Many thanks again Ashkan

EDIT/UPDATE:

I believe the message was because the number of processes were less than the list of processors. As I had two cases/simulations with each using 2 processors, when I was using hyper-threading I had 8 processors so getting the message. It was resolved using 4 processors or having larger pool of simulations.

6
  • I cannot fully understand this question. So what is "lets say 2 processors as each case is a parallel (MPI) simulation"? Commented Jan 3, 2018 at 3:30
  • Make sure you're correctly splitting the work among the threads: you don't want to have... 10 threads (to say something) running the whole simulation in each of them. Commented Jan 3, 2018 at 4:44
  • Sorry if the description was not clear. Basically, what I mean is that each of my python tasks is a parallel simulation that would use 2 processors. So when the tasks are assigned to processors, I need each task to have 2 processors available to it rather than 1. Think about it like the python code tries to run multiple simulations at the same time but each simulation itself is a parallel simulation that needs 2 processors. I really hope it is clear what I mean. Commented Jan 3, 2018 at 7:37
  • @BorrajaX I think that's exactly what i should do but how can I correctly split the tasks to the threads? For example, I need to split each task to 2 threads. Commented Jan 3, 2018 at 7:48
  • That depends on the task and how the simulations are run. List of points? Maybe you can pass the first half of points to one and the second to other? (watch out for interdependencies, though!!) Commented Jan 3, 2018 at 7:49

1 Answer 1

1

multiprocessing.Pool accepts the number of processes to create as its first argument. you can use multiprocessing.cpu_count() to get the number of logical cpu cores then create half as many processes in the pool (so each gets 2)

 multiprocessing.Pool(multiprocessing.cpu_count()/2)

this assumes that your cpu count devides by 2 which for almost any cpu out there is true...

note that this solution does not account for SMT (or hyperthreading) since multiprocessing.cpu_count() counts logical cores so it might report double your physical cores. for most cpu intensive tasks SMT is a performance boost, you run double the tasks but at more than half the speed, if you do have SMT you need to decide whether or not its good for your simulation

lastly you could also set the affinity of each process so that it can only run on 2 cores but there is no straightforward standard way to do this since multiprocessing doesn't expose the PID of the processes it opens. here's some rough complete code that can set the affinities for each process:

import multiprocessing
import psutil
import itertools

cores_per_process = 2
cpu_count = multiprocessing.cpu_count()

manager = multiprocessing.Manager()
pid_list = manager.list()  # trick to find pid of all the processes in the pool

cores_list = range(cpu_count)
it = [iter(cores_list)] * cores_per_process  # this is a python trick to group items from the same list together into tuples
affinity_pool = manager.list(zip(*it))  # list of affinity

def Run_Case(params):
    self_proc = psutil.Process()  # get your own process
    if self_proc.pid not in pid_list:
        pid_list.append(self_proc.pid)  # found new pid in pool
        self_proc.cpu_affinity(affinity_pool.pop())  # set affinity from the affinity list but also remove it so other processes can't use the same affinity
    #run simulation

a = range(1, len(U))
b = range(len(R))
paramlist = list(itertools.product(a, b))
pool = multiprocessing.Pool(cpu_count/cores_per_process)  # decide on how many processes you want
pool.map(Run_Case, paramlist)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.