0

I am using Mpi4py with OpenMPI on Ubuntu, and I also need this to run on RedHat. I am trying to run compute_ldos() for a relatively large set of parameters, and am trying to use as all available cores on my node to make this faster. Right now I am calling compute_ldos on a single thread for each parameter combination (embarrassingly parallel I believe it's called?). compute_ldos will use all available processes, and I need to use multiple threads to reduce memory usage. However, I lose efficiency if I assign to many to a single compute_ldos call. How do I assign multiple processes to each worker process?

def worker_process(distances, rank):
    """Worker function executed by each MPI process."""
    return [compute_ldos(dist[0], dist[1], dist[2], rank) for dist in distances]

def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()
    start = 50
    end = 100
    res = [20, 10, 5, 5]
    num_angles = 10
    
    distances = []
    points = np.linspace(start, end, len(res) + 1)
    
    for idx, point in enumerate(points[:-1]):
        distances.extend(np.linspace(point, points[idx + 1], res[idx])[:-1])

    distances.append(end)
    
    angle_sample = np.arange(0, 360, num_angles / 360)
    
    distances = [[x, angle, False] for x in distances for angle in angle_sample]
    distances.append([0, 0, True])

    # Distribute distances across MPI ranks
    distances_split = np.array_split(distances, size)[rank]

    setup_stdout(rank)

    # Each MPI process runs its own worker function
    procs = worker_process(distances_split, rank)

compute_ldos calls a meep function to run an FDTD simulation.

4
  • Multithreading in (pure) Python computational code are generally a wast of time (it is mainly designed for latency bound operations like IOs, and also concurrency). It is often limited by the Global Interpreter Lock (GIL). Some modules release the GIL in most of their computing functions. This is the case of Numpy. That being said, Numpy functions are often memory-bound and memory-bound codes do not scale.If you want your MPI code to scale with multiple thread, I advise you to use Cython (supporting OpenMP threads) so not to be limited by the GIL and make codes less memory bound. Commented Mar 22 at 20:23
  • Assuming you know what you are doing with respect to "threads", this is a two step tango. On the mpirun command line, specify how many cores should be assigned per MPI rank. For example, if you want to assign 4 cores per MPI rank, you can mpirun --map-by node:PE=4 ... Commented Mar 23 at 3:19
  • @GillesGouaillardet thanks, that works for OpenMPI if I map-by slot and bind-to hwthread. Do you by any chance know how I can do this with MPICH? Commented Mar 23 at 16:11
  • Your first step to get this running with MPICH on Ubuntu is to build your own MPICH, because the Ubuntu package is bricked: bugs.launchpad.net/ubuntu/+source/mpich/+bug/2072338 Commented Mar 24 at 13:11

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.