0

I want to automate resource allocation in an HPC server's, node forwarding and open jupyterlab in the same node. Individually I have to go through the following steps:

user@login1>salloc -A nike2025-22-391 -t 1:00:00 -p shared -N 1
user@login1>echo $SLURM_NODELIST #gives the name of the assigned node (e.g., nid001000)
user@login1>ssh -NfL 8889:localhost:8889 <nodename>
user@login1>ssh <nodename>
user@nodename>source Private/chandra/my_env/bin/activate
user@nodename>jupyter lab --no-browser --port=8889

I tried to write this script for the automation (with the help of ChatGPT):

#!/bin/bash

# Step 1: Request a compute node
salloc -A naiss2025-22-391 -t 1:00:00 -p shared -N 1 &
sleep 10


# Step 2: Get the assigned node name
NODE=$SLURM_NODELIST

if [ -z "$NODE" ]; then
  echo "Failed to get node name."
  exit 1
fi

echo "Node allocated: $NODE"

# Step 3: Set up SSH tunnel from login → compute node
ssh -NfL 8889:localhost:8889 "$NODE"

# Step 4: SSH into node and launch JupyterLab
ssh "$NODE" << EOF
  source ~/Private/chandra/my_env/bin/activate
  jupyter lab --no-browser --port=8889 --ip=127.0.0.1
EOF

However, using this script, my job allocation is also relinquished at the same time it is assigned.

salloc: Pending job allocation 9049076
salloc: job 9049076 queued and waiting for resources
salloc: job 9049076 has been allocated resources
salloc: Granted job allocation 9049076
salloc: Waiting for resource configuration
salloc: Nodes nid001031 are ready for job
salloc: Relinquishing job allocation 9049076
salloc: Job allocation 9049076 has been revoked.
Failed to get node name.

It would be great if anyone can provide a solution and to why this might be happening.

3
  • 2
    I'm not sure if this is the issue, but your script runs salloc in the background (ends with '&'), but your manual steps do not Commented Mar 21 at 13:10
  • If I remove &, the process stops after salloc: Nodes nid001031 are ready for job. The later part of the script is not executed. Commented Mar 21 at 16:23
  • 1
    From the salloc documentation: "...it then runs the command specified by the user. Finally, when the user specified command is complete, salloc relinquishes the job allocation. The command may be any program the user wishes. Some typical commands are xterm, a shell script containing srun commands, and srun (see the EXAMPLES section). If no command is specified, then salloc runs the user's default shell." The remainder of your script needs to be a command passed to salloc. Commented Mar 21 at 17:39

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.