5

I haven't worked with distributed computing before, but I'm trying to integrate mpi4py into a program in order to parallelize a for loop on a compute cluster.

This is a pseudocode of what I want to do:

for file in directory: Initialize a class Run class methods Conglomerate results

I've looked all over stack overflow and I can't find any solution to this. Is there any way to do this simply with mpi4py, or is there another tool that can do it with easy installation and setup?

1 Answer 1

6

In order to achieve parallelism of a for loop with MPI4Py check the code example below. Its just a for loop to add some numbers. The for loop will execute in every node. Every node will get a different chunk of data to work with (range in for loop). In the end Node with rank zero will add the results from all the nodes.

#!/usr/bin/python

import numpy
from mpi4py import MPI
import time

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

a = 1
b = 1000000

perrank = b//size
summ = numpy.zeros(1)

comm.Barrier()
start_time = time.time()

temp = 0
for i in range(a + rank*perrank, a + (rank+1)*perrank):
    temp = temp + i

summ[0] = temp

if rank == 0:
    total = numpy.zeros(1)
else:
    total = None

comm.Barrier()
#collect the partial results and add to the total sum
comm.Reduce(summ, total, op=MPI.SUM, root=0)

stop_time = time.time()

if rank == 0:
    #add the rest numbers to 1 000 000
    for i in range(a + (size)*perrank, b+1):
        total[0] = total[0] + i
    print ("The sum of numbers from 1 to 1 000 000: ", int(total[0]))
    print ("time spent with ", size, " threads in milliseconds")
    print ("-----", int((time.time()-start_time)*1000), "-----")

In order to execute the code above you should run it like this:

$ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I # Salomon: ncpus=24:mpiprocs=24
$ ml Python
$ ml OpenMPI
$ mpiexec -bycore -bind-to-core python hello_world.py

In this example, we run MPI4Py enabled code on 4 nodes, 16 cores per node (total of 64 processes), each python process is bound to a different core.

Sources that may help you:
Submit job with python code (mpi4py) on HPC cluster
https://github.com/JordiCorbilla/mpi4py-examples/tree/master/src/examples/matrix%20multiplication

Sign up to request clarification or add additional context in comments.

1 Comment

How to understand that you use "threads" in " threads in milliseconds"? I think it should be called "processor" or "task".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.