With GNU Parallel it looks like this:
parallel -j4 CUDA_VISIBLE_DEVICES='$(({%} - 1))' {} ::: run*.sh
GNU parallel is a tool for executing jobs in parallel, combined with some shell scripting to manage GPU resources. Here's a detailed breakdown:
parallel:
parallel is a command-line driven tool that allows you to execute multiple jobs in parallel. It's particularly useful for running the same command with different arguments or for distributing tasks across multiple cores or nodes.
-j4:
This flag tells parallel to use 4 jobs simultaneously. In other words, it will run up to 4 processes at the same time.
CUDA_VISIBLE_DEVICES: This environment variable controls which GPUs are visible to CUDA-enabled applications.
{%} is a special replacement string in parallel. It represents the job slot number, starting from 1.
$(( )) is arithmetic expansion in bash, which evaluates the expression inside.
{%} - 1 subtracts 1 from the job slot number, because CUDA device indices typically start at 0.
So, this sets each job to use a different GPU, based on its job slot number. For example:
- Job 1 uses GPU 0
- Job 2 uses GPU 1
- Job 3 uses GPU 2
- Job 4 uses GPU 3
- Job 2 finishes first freeing GPU 1
- Job 5 uses GPU 1
- Job 3 finishes freeing GPU 2
- Job 6 uses GPU 2
- Job 1 finishes freeing GPU 0
- Job 7 uses GPU 0
{} is a placeholder for the argument(s) that parallel will substitute with actual filenames or data.
::: run * .sh specifies that parallel should run the command for each file matching run*.sh in the current directory. This means if you have files like run1.sh, run2.sh, etc., each of these scripts will be executed.
parallelorxargsare made for: launch a given number of jobs (4 in your case), no more, keep this workload by starting a new job each time a job completes, until all jobs complete (10 in your case). See the-Poption.xargscan be used for that. You can also use a basic bash script and count concurrent jobs withjobs -r | wc -lin a loop running the tasks.