1

I have written for loop and parallelized it with & and limited the threshold to running 3 jobs at one time. Below is my script. I am reserving 32 cores and 256 GB memory through BSUB command. The sample_pipe I am running inside for loop requires 32 cores and 256 GB memory.

I am getting memory failure errors on some jobs. I think I am reserving only 32 cores and 256 GB and trying to run 3 jobs at a time which might be causing memory failure errors on some jobs.

My question is how do I parallelize such that all 3 jobs are using the same amount of cores and memory.

I submitted using the command bsub < example.sh

#!/bin/bash
#BSUB -J cnt_job            # LSF job name
#BSUB -o cnt_job.%J.out     # Name of the job output file
#BSUB -e cnt_job.%J.error   # Name of the job error file
#BSUB -n 32                 # 32 cores
#BSUB -M 262144             # 256 GB
#BSUB -R "span[hosts=1] rusage [mem=262144]"

n=0
maxjobs=3
for sample in $main ; do
     for run in $nested ; do
             sample_pipe count --id="$run_name" \
             --localcores=32 \
             --localmem=256 &
     done
     cd ..
     # limit jobs
    if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
        wait # wait until all have finished
        echo $n wait
    fi
done
9
  • for those not in the know: is this an actual, normal, Bash script, or some similar-looking thing that's interpreted by some other tool, using some other rules? In particular, for sample in main would just run one iteration of the loop, so the script looks a bit confusing. Commented Sep 22, 2021 at 14:22
  • Its an example. Some similar-looking script. Commented Sep 22, 2021 at 14:25
  • You are explicitly launching three jobs and giving each 32 cores and 256G of ram. I don't quite see what you are expecting here. Perhaps you meant to give each of your jobs 32/3 =~ 10 CPUs? Commented Sep 22, 2021 at 14:33
  • Thanks for clarifying this. I have had memory issues on some jobs and thought each job was not receiving 32 cores and 256G. Commented Sep 22, 2021 at 15:09
  • 2
    You seem to be using some sort of queuing system that manages resources (that's what the #BSUB lines look like). Have you considered giving it one script per process, instead of looping over $main and letting it take care of the rest? That's basically the whole idea of such systems. Most have their own tools for looping, but you would have to talk to whoever is responsible for this cluster. Commented Sep 22, 2021 at 15:45

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.