I want to automate resource allocation in an HPC server's, node forwarding and open jupyterlab in the same node. Individually I have to go through the following steps:
user@login1>salloc -A nike2025-22-391 -t 1:00:00 -p shared -N 1
user@login1>echo $SLURM_NODELIST #gives the name of the assigned node (e.g., nid001000)
user@login1>ssh -NfL 8889:localhost:8889 <nodename>
user@login1>ssh <nodename>
user@nodename>source Private/chandra/my_env/bin/activate
user@nodename>jupyter lab --no-browser --port=8889
I tried to write this script for the automation (with the help of ChatGPT):
#!/bin/bash
# Step 1: Request a compute node
salloc -A naiss2025-22-391 -t 1:00:00 -p shared -N 1 &
sleep 10
# Step 2: Get the assigned node name
NODE=$SLURM_NODELIST
if [ -z "$NODE" ]; then
echo "Failed to get node name."
exit 1
fi
echo "Node allocated: $NODE"
# Step 3: Set up SSH tunnel from login → compute node
ssh -NfL 8889:localhost:8889 "$NODE"
# Step 4: SSH into node and launch JupyterLab
ssh "$NODE" << EOF
source ~/Private/chandra/my_env/bin/activate
jupyter lab --no-browser --port=8889 --ip=127.0.0.1
EOF
However, using this script, my job allocation is also relinquished at the same time it is assigned.
salloc: Pending job allocation 9049076
salloc: job 9049076 queued and waiting for resources
salloc: job 9049076 has been allocated resources
salloc: Granted job allocation 9049076
salloc: Waiting for resource configuration
salloc: Nodes nid001031 are ready for job
salloc: Relinquishing job allocation 9049076
salloc: Job allocation 9049076 has been revoked.
Failed to get node name.
It would be great if anyone can provide a solution and to why this might be happening.
&, the process stops aftersalloc: Nodes nid001031 are ready for job. The later part of the script is not executed.