Problem: Inspired by this thread, I'm trying to write a wrapper script that submits SLURM array jobs with bash variables. However, I'm running into issues with SLURM environment variables like $SLURM_ARRAY_TASK_ID as it acts as an empty variable.
I suspect it has something to do with how the test_wrapper.sh is parsing the yet undefined SLURM variable, but I can't seem to find a solution.
Below I provide a working example with a simple python script that should take an array ID as an input variable, but when it is called by the bash wrapper script, the python script crashes as it receives an empty variable.
test_wrapper.sh :
#!/bin/bash
for argument in "$@"
do
key=$(echo $argument | cut -f 1 -d'=')
value=$(echo $argument | cut -f 2 -d'=')
case "$key" in
"job_name") job_name="$value" ;;
"cpus") cpus="$value" ;;
"memory") memory="$value" ;;
"time") time="$value" ;;
"array") array="$value" ;;
*)
esac
done
sbatch <<EOT
#!/bin/bash
#SBATCH --account=foobar
#SBATCH --cpus-per-task=${cpus:-1}
#SBATCH --mem-per-cpu=${memory:-1}GB
#SBATCH --time=${time:-00:01:00}
#SBATCH --array=${array:-1-2}
#SBATCH --job-name=${job_name:-Default_Job_Name}
if [ -z "$SLURM_ARRAY_TASK_ID" ]
then
echo "The array ID \$SLURM_ARRAY_TASK_ID is empty"
else
echo "The array ID \$SLURM_ARRAY_TASK_ID is NOT empty"
fi
srun python foo.py -a $SLURM_ARRAY_TASK_ID
echo "Job finished with exit code $?"
EOT
where foo.py is:
import argparse
def main(args):
print('array number is : {}'.format(args.array_number))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-a", "--array_number",
help="the value passed from SLURM_ARRAY_TASK_ID"
)
args = parser.parse_args()
main(args)
$cat slurm-123456789_1.out yields :
The array ID 1 is empty
usage: foo.py [-h] [-a ARRAY_NUMBER]
foo.py: error: argument -a/--array_number: expected one argument
srun: error: nc10931: task 0: Exited with exit code 2
Job finished with exit code 0
I find it strange, that "The array ID 1 is empty" is correctly printing the $SLURM_ARRAY_TASK_ID (??)
test_wrapper.shto, after the shebang, runexec 2>trace.log; PS4=':$LINENO+'; set -xand then reviewingtrace.logwill at least let you see what's going on in the loop that iterates over arguments.sbatch <<EOTwith{ tee /dev/stderr | sbatch; } <<EOT, then your trace log would also contain the script as it's being passed tosbatch.Job finished with exit code 0because $? is substituted before sbatch receives the script. Perhaps you want to prevent parameter expansions by quoting<<"EOT"?test_wrapper.shscript (as that comment said). As comments don't preserve whitespace, extra content should be edited into the question.