0

Update: I was able to get a variable assignment from SLURM_JOB_ID with this line. JOBID=`echo ${SLURM_JOB_ID}` However, I haven't yet gotten SLURM_ARRAY_JOB_ID to assign itself to JOBID.


Due to needing to support existing HPC workflows. I have a need to pass a bash script within a python subprocess. It was working great with openpbs, now I need to convert it to SLURM. I have it largely working in SLURM hosted on Ubuntu 20.04 except that the job array is not being populated. Below is a code snippet greatly stripped down to what's relevant.

The specific question I have is. Why are the lines JOBID=${SLURM_JOB_ID} and JOBID=${SLURM_ARRAY_JOB_ID} are not getting their assignments? I've tried using a heredoc and various bashisms without success.

The code certainly can be cleaner, it's the result of multiple people without a common standard.

These are relevant

Accessing task id for array jobs

Handling bash system variables and slurm environmental variables in a wrapper script

       sbatch_arguments = "#SBATCH --array=1-{}".format(get_instance_count())

       proc = Popen('ssh ${USER}@server_hostname /apps/workflows/slurm_wrapper.sh sbatch', shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
        job_string = """#!/bin/bash -x
        #SBATCH --job-name=%(name)s
        #SBATCH -t %(walltime)s
        #SBATCH --cpus-per-task %(processors)s
        #SBATCH --mem=%(memory)s
        %(sbatch_args)s

        # Assign JOBID
        if [ %(num_jobs)s -eq 1 ]; then
            JOBID=${SLURM_JOB_ID}
        else
            JOBID=${SLURM_ARRAY_JOB_ID}
        fi

        exit ${returnCode}

        """ % ({"walltime": walltime
                ,"processors": total_cores
                ,"binary": self.binary_name
                ,"name": ''.join(x for x in self.binary_name if x.isalnum())
                ,"memory": memory
                ,"num_jobs": self.get_instance_count()
                ,"sbatch_args": sbatch_arguments
                })

        # Send job_string to sbatch
        stdout, stderr = proc.communicate(input=job_string)
2
  • Debugging this some more I've realized the SBATCH --array directive must not be recognized. It's being passed as an argument which I can see is passed correctly. There might be some order of execution I'm not aware of Commented Nov 7, 2022 at 23:20
  • Perhaps for some further information. I can run a non-array job fine with this approach. The SLURM_ARRAY_* variables apparently are evaluated by BASH too early. Before SBATCH has them defined. I've been trying to change that with a heredoc and some bashisms. I still haven't had success. I'm hoping I don't have to scrap this for a whole new approach. Commented Nov 16, 2022 at 0:54

1 Answer 1

0

Following up on this. I sovled it by passing SBATCH directives as args to the sbatch command

    sbatch_args = """--job-name=%(name)s --time=%(walltime)s --partition=defq --cpus-per-task=%(processors)s --mem=%(memory)s""" % (
                    {"walltime": walltime
                    ,"processors": cores
                    ,"name": ''.join(x for x in self.binary_name if x.isalnum())
                    ,"memory": memory
                    })

    # Open a pipe to the sbatch command. {tee /home/ahs/schuec1/_stderr_slurmqueue | sbatch; }
    # The SLURM variables SLURM_ARRAY_* do not exist until after sbatch is called.
    # Popen.communicate has BASH interpret all variables at the same time the script is sent.
    # Because of that, the job array needs to be declared prior to the rest of the BASH script.

    # It seems further that all SBATCH directives are not being evaultated when passed via a string with .communicate
    # due to this, all SBATCH directives will be passed as arguments to the slurm_wrapper.sh as the first command to the Popen pipe.

    proc = Popen('ssh ${USER}@hostname /apps/workflows/slurm_wrapper.sh sbatch %s' % sbatch_args,
    shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE,
    close_fds=True,
    executable='/bin/bash')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.