I'm currently using a Python script that requires MPI to operate. This script runs on a SLURM system.
In order to run my Python script, I define the number of nodes to use and launch the following command within my sbatch submission file:
#!/bin/bash
#SBATCH --time=12:00:00
#SBATCH --nodes=12
#SBATCH --ntasks=12
#SBATCH --cpus-per-task=20
#SBATCH --qos=qosname
#SBATCH --account=accountname
#SBATCH --partition=partitionname
mpirun -np 1 --report-bindings --bind-to none -oversubscribe python Main.py
This setup is working fine. However, I now want to introduce an additional task within the Python script that requires running another instance of mpirun. Since this task needs to run on different nodes, I thought to submit a new sbatch job with mpirun using the following Python command:
os.system(f"sbatch execute.sub")
The associated execute.sub submission file is designed as follows:
#!/bin/bash
#SBATCH --time=1:00:00
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
#SBATCH --qos=qosname
#SBATCH --account=accountname
#SBATCH --partition=partitionname
mpirun -np 60 -oversubscribe --bind-to none other_application
However, when I attempt this, I encounter an error message: "mpirun does not support recursive calls". I'm confused by this, because I was under the impression that I was simply submitting a standalone job, independent of any other operations.
Could anyone please help me understand what's going wrong and how to correct it? Thanks in advance.
mpirunwithin a MPI application.