How to nest mpi processes with python

Ask Question

Asked 8 months ago

Modified 8 months ago

Viewed 68 times

I am using Mpi4py with OpenMPI on Ubuntu, and I also need this to run on RedHat. I am trying to run compute_ldos() for a relatively large set of parameters, and am trying to use as all available cores on my node to make this faster. Right now I am calling compute_ldos on a single thread for each parameter combination (embarrassingly parallel I believe it's called?). compute_ldos will use all available processes, and I need to use multiple threads to reduce memory usage. However, I lose efficiency if I assign to many to a single compute_ldos call. How do I assign multiple processes to each worker process?

def worker_process(distances, rank):
    """Worker function executed by each MPI process."""
    return [compute_ldos(dist[0], dist[1], dist[2], rank) for dist in distances]

def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()
    start = 50
    end = 100
    res = [20, 10, 5, 5]
    num_angles = 10
    
    distances = []
    points = np.linspace(start, end, len(res) + 1)
    
    for idx, point in enumerate(points[:-1]):
        distances.extend(np.linspace(point, points[idx + 1], res[idx])[:-1])

    distances.append(end)
    
    angle_sample = np.arange(0, 360, num_angles / 360)
    
    distances = [[x, angle, False] for x in distances for angle in angle_sample]
    distances.append([0, 0, True])

    # Distribute distances across MPI ranks
    distances_split = np.array_split(distances, size)[rank]

    setup_stdout(rank)

    # Each MPI process runs its own worker function
    procs = worker_process(distances_split, rank)

compute_ldos calls a meep function to run an FDTD simulation.

asked Mar 22 at 13:45

lmcdev

33 bronze badges

Multithreading in (pure) Python computational code are generally a wast of time (it is mainly designed for latency bound operations like IOs, and also concurrency). It is often limited by the Global Interpreter Lock (GIL). Some modules release the GIL in most of their computing functions. This is the case of Numpy. That being said, Numpy functions are often memory-bound and memory-bound codes do not scale.If you want your MPI code to scale with multiple thread, I advise you to use Cython (supporting OpenMP threads) so not to be limited by the GIL and make codes less memory bound.

Jérôme Richard
– Jérôme Richard

2025-03-22 20:23:13 +00:00
Commented Mar 22 at 20:23
Assuming you know what you are doing with respect to "threads", this is a two step tango. On the mpirun command line, specify how many cores should be assigned per MPI rank. For example, if you want to assign 4 cores per MPI rank, you can mpirun --map-by node:PE=4 ...

Gilles Gouaillardet
– Gilles Gouaillardet

2025-03-23 03:19:32 +00:00
Commented Mar 23 at 3:19
@GillesGouaillardet thanks, that works for OpenMPI if I map-by slot and bind-to hwthread. Do you by any chance know how I can do this with MPICH?

lmcdev
– lmcdev

2025-03-23 16:11:41 +00:00
Commented Mar 23 at 16:11
Your first step to get this running with MPICH on Ubuntu is to build your own MPICH, because the Ubuntu package is bricked: bugs.launchpad.net/ubuntu/+source/mpich/+bug/2072338

Joachim
– Joachim

2025-03-24 13:11:03 +00:00
Commented Mar 24 at 13:11

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to nest mpi processes with python

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest