2

I have a computing cluster with four nodes A, B, C and D and Slurm Version 17.11.7. I am struggling with Slurm array jobs. I have the following bash script:

#!/bin/bash -l
#SBATCH --job-name testjob
#SBATCH --output output_%A_%a.txt
#SBATCH --error error_%A_%a.txt
#SBATCH --nodes=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=50000

FOLDER=/home/user/slurm_array_jobs/
mkdir -p $FOLDER
cd ${FOLDER}

echo $SLURM_ARRAY_TASK_ID > ${SLURM_ARRAY_TASK_ID}

The script generates the following files:

  • output_*txt,
  • error_*txt,
  • files named according to ${SLURM_ARRAY_TASK_ID}

I run the bash script on my computing cluster node A as follows

sbatch --array=1-500 example_job.sh

The 500 jobs are distributed among nodes A-D. Also, the output files are stored on the nodes A-D, where the corresponding array job has run. In this case, for example, approximately 125 "output_" files are separately stored on A, B, C and D.

Is there a way to store all output files on the node where I submit the script, in this case, on node A? That is, I like to store all 500 "output_" files on node A.

2 Answers 2

1

Slurm does not handle input/output files transfer and assumes that the current working directory is a network filesystem such as for instance NFS for the simplest case. But GlusterFS, BeeGFS, or Lustre are other popular choices for Slurm.

Sign up to request clarification or add additional context in comments.

Comments

0

Use an epilog script to copy the outputs back to where the script was submitted, then delete them.

Add to slurm.conf:

Epilog=/etc/slurm-llnl/slurm.epilog

The slurm.epilog script does the copying (make this executable by chmod +x):

#!/bin/bash

userId=`scontrol show job ${SLURM_JOB_ID} | grep -i UserId | cut -f2 -d '=' | grep -i -o ^[^\(]*`
stdOut=`scontrol show job ${SLURM_JOB_ID} | grep -i StdOut | cut -f2 -d '='`
stdErr=`scontrol show job ${SLURM_JOB_ID} | grep -i StdErr | cut -f2 -d '='`
host=`scontrol show job ${SLURM_JOB_ID} | grep -i AllocNode | cut -f3 -d '=' | cut -f1 -d ':'`
hostDir=`scontrol show job ${SLURM_JOB_ID} | grep -i Command | cut -f2 -d '=' | xargs dirname`
hostPath=$host:$hostDir/

runuser -l $userId -c "scp $stdOut $stdErr $hostPath"
rm -rf $stdOut
rm -rf $stdErr

(Switching from PBS to Slurm without NFS or similar shared directories is a pain.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.