1

I'm currently using a Python script that requires MPI to operate. This script runs on a SLURM system.

In order to run my Python script, I define the number of nodes to use and launch the following command within my sbatch submission file:

#!/bin/bash
#SBATCH --time=12:00:00
#SBATCH --nodes=12
#SBATCH --ntasks=12
#SBATCH --cpus-per-task=20
#SBATCH --qos=qosname
#SBATCH --account=accountname
#SBATCH --partition=partitionname
mpirun -np 1 --report-bindings --bind-to none -oversubscribe python Main.py

This setup is working fine. However, I now want to introduce an additional task within the Python script that requires running another instance of mpirun. Since this task needs to run on different nodes, I thought to submit a new sbatch job with mpirun using the following Python command:

os.system(f"sbatch execute.sub")

The associated execute.sub submission file is designed as follows:

#!/bin/bash
#SBATCH --time=1:00:00
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
#SBATCH --qos=qosname
#SBATCH --account=accountname
#SBATCH --partition=partitionname
mpirun -np 60 -oversubscribe --bind-to none other_application

However, when I attempt this, I encounter an error message: "mpirun does not support recursive calls". I'm confused by this, because I was under the impression that I was simply submitting a standalone job, independent of any other operations.

Could anyone please help me understand what's going wrong and how to correct it? Thanks in advance.

1
  • you simply cannot invoke mpirun within a MPI application. Commented Jul 13, 2023 at 2:02

1 Answer 1

2

The problem is the MPI and OpenMPI environment variables are propagated from the initial mpirun command to the nested one through os.system and sbatch.

You can remove them in Bash with

unset "${!OMPI_@}" "${!MPI_@}"

That line can be set before the mpirun command in execute.sub, or inside the os.system command like this: os.system('unset "${!OMPI_@}" "${!MPI_@} ; sbatch execute.sub").

Alternatively, you can also use the --export parameter of sbatch to keep only the variables you need:

os.system(f"sbatch --export=PATH,LD_LIRARY_PATH execute.sub')

Note that your subprocess might fail if you forget variables that are important to it.

Another option is to manipulate the environment in the submitting Python script os.environ before running the os.system command and restoring it afterwards.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you that is working! I chose to use "os.system('unset "${!OMPI_@}" "${!MPI_@} ; sbatch execute.sub")" and this is perfect. Thanks for the thorough explanation.
FWIW we had this kind of discussion inside the Open MPI community, and the consensus is that even if it seems to work now, you should not take it for granted and it might break in the future.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.