How to write a GPU worker pool to run multiple tasks at the same time in bash?

Question

Suppose there are 4 CUDA devices (0,1,2,3) on my computer and there are 10 tasks to run, each tasks is a script named run01.sh, run02.sh, ..., run10.sh.

The problem is, each task use only 1 GPU, I want to write a bash script to run those 10 tasks at the same time to make best use of the 4 CUDA devices. How can I make it?

Update:

To address @RenaudPacalet's question

Please edit your question and explain how you specify which device to use for a given job.

There are 4 workers (4 CUDA devices), what I want is to find out a solution in bash which ensure:

1 worker can only handle 1 task at a time.
there should be no free workers if there are remaining tasks.

Your question is not really a programming question and it somehow asks for recommendations, which is out of scope here. Did you look at GNU parallel? — Renaud Pacalet
– Renaud Pacalet, Commented Dec 20, 2024 at 14:58
@RenaudPacalet I don't think parallel or xargs can deal with this as there are only 4 CUDA devices. for 10 tasks. And parallel/xargs cannot allocate limited resources to mutliple tasks. I would like to know is there any useful bash technique can be used to solve this issue. — link89
– link89, Commented Dec 20, 2024 at 16:53
there are only 4 CUDA devices. for 10 tasks That's what parallel or xargs are made for: launch a given number of jobs (4 in your case), no more, keep this workload by starting a new job each time a job completes, until all jobs complete (10 in your case). See the -P option. — Renaud Pacalet
– Renaud Pacalet, Commented Dec 20, 2024 at 17:07
GPU are massively parallel devices, and they can execute tasks of multiple process without any problem as long as you have enough memory for all the tasks (note that each task is already parallel but may not always saturate the GPU). In fact, besides memory, the overall execution can often be faster with more task running in parallel because GPU can overlap latency stall with other tasks. And yes, xargs can be used for that. You can also use a basic bash script and count concurrent jobs with jobs -r | wc -l in a loop running the tasks. — Jérôme Richard
– Jérôme Richard, Commented Dec 20, 2024 at 17:21
@link89 Please edit your question and explain how you specify which device to use for a given job. — Renaud Pacalet
– Renaud Pacalet, Commented Dec 21, 2024 at 6:26

Fravadona · Accepted Answer · 2024-12-22 13:38:18Z

3

To make a job use specific CUDA GPUs you can set the environment variable CUDA_VISIBLE_DEVICES.

Now, if you need to limit the number of jobs that can run at the same time on a single GPU then a possible solution is to use a fifo queue with the list of available GPUs; the idea is that each job will have to "take" a GPU from the fifo queue, and after said job completes, it writes the GPU it's been using to the fifo so that a new job can use it.

#!/bin/bash

mkfifo free_gpus
exec 3<>free_gpus

# write the GPUs you want to use (0,1,2,3) into the fifo.
# for 2 jobs per GPU, write each GPU twice.
printf '%s\n' 0 1 2 3 >&3 &

for exe in run*.sh
do
    read gpu # allocate a GPU (blocks until there's one available)
    { CUDA_VISIBLE_DEVICES="$gpu" "$exe"; echo "$gpu" >&3; } &
done <&3
wait

edited Dec 22, 2024 at 13:38

answered Dec 20, 2024 at 17:39

Fravadona

17.6k1 gold badge29 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

link89 Dec 21, 2024 at 16:43

Cool solution. I don't get it until I read explanation of chatgpt.

Fravadona Dec 21, 2024 at 16:53

chatGPT beaks it down nicely indeed

Ole Tange · Accepted Answer · 2025-01-03 21:43:55Z

With GNU Parallel it looks like this:

parallel -j4 CUDA_VISIBLE_DEVICES='$(({%} - 1))' {} ::: run*.sh

GNU parallel is a tool for executing jobs in parallel, combined with some shell scripting to manage GPU resources. Here's a detailed breakdown:

parallel: parallel is a command-line driven tool that allows you to execute multiple jobs in parallel. It's particularly useful for running the same command with different arguments or for distributing tasks across multiple cores or nodes.

-j4: This flag tells parallel to use 4 jobs simultaneously. In other words, it will run up to 4 processes at the same time.

CUDA_VISIBLE_DEVICES: This environment variable controls which GPUs are visible to CUDA-enabled applications.

{%} is a special replacement string in parallel. It represents the job slot number, starting from 1.

$(( )) is arithmetic expansion in bash, which evaluates the expression inside.

{%} - 1 subtracts 1 from the job slot number, because CUDA device indices typically start at 0.

So, this sets each job to use a different GPU, based on its job slot number. For example:

Job 1 uses GPU 0
Job 2 uses GPU 1
Job 3 uses GPU 2
Job 4 uses GPU 3
Job 2 finishes first freeing GPU 1
Job 5 uses GPU 1
Job 3 finishes freeing GPU 2
Job 6 uses GPU 2
Job 1 finishes freeing GPU 0
Job 7 uses GPU 0

{} is a placeholder for the argument(s) that parallel will substitute with actual filenames or data.

::: run * .sh specifies that parallel should run the command for each file matching run*.sh in the current directory. This means if you have files like run1.sh, run2.sh, etc., each of these scripts will be executed.

That looks awesome! Can you explain how it works in your answer?

Collectives™ on Stack Overflow

How to write a GPU worker pool to run multiple tasks at the same time in bash?

Update:

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Update:

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related