3

I just wrote a python script which involves multi-threading, something like:

    python myScript.py -cpu_n 5 -i input_file

To run the command for my hundreds of input files, I am generating a list (commands.list) of commands for each one:

    python myScript.py -cpu_n 5 -i input_file1
    python myScript.py -cpu_n 5 -i input_file2
    python myScript.py -cpu_n 5 -i input_file3
    ...

And I'm trying to schedule them with the command 'parallel' and 10 CPUs of three different machines:

   parallel -S 10/$server1 -S 10/$server2 -S 10/$server3 < commands.list

My question is: what is the max number of CPUs will be used on each server with the parallel command? Will it be 5*10=50 or just 10 cpus?

3
  • Hard to say without any code. Commented Jan 26, 2017 at 9:39
  • Firstly, be careful to differentiate CPUs from cores, a single CPU may have 8 or more cores. Secondly, you need to be careful about jobs and cores - you could create one job that might wander around and get executed on different cores at different times - think "CPU/core affinity". One job could, theoretically run on each of the 10 CPUs at some stage in its life. Commented Jan 26, 2017 at 10:08
  • If the commands.list is so simple, you might simply have GNU Parallel generate it: parallel python myScript.py -cpu_n 5 -i ::: input_file*; If the command is very complex, use a bash function: gnu.org/software/parallel/… Commented Jan 26, 2017 at 19:48

1 Answer 1

2

From man parallel:

   --jobs N
   -j N
   --max-procs N
   -P N     Number of jobslots on each machine. Run up to N
            jobs in parallel.  0 means as many as possible.
            Default is 100% which will run one job per CPU
            core on each machine.


   -S
   [@hostgroups/][ncpu/]sshlogin[,[@hostgroups/][ncpu/]sshlogin[,...]]
   :
            GNU parallel will determine the number of CPU
            cores on the remote computers and run the number
            of jobs as specified by -j.  If the number ncpu
            is given GNU parallel will use this number for
            number of CPU cores on the host. Normally ncpu
            will not be needed.

So your command will run up to 10 jobs on each server in parallel.

Whether each of your commands will use 5 CPU cores is unclear. If each of your commands use 5 cores, 50 cores per server will be used, and in this case I will recommend you do not use the ncpu/server syntax, but instead use:

parallel -j 20% -S $server1,$server2,$server3 < commands.list

This way you can mix servers that have different number of cores, and GNU Parallel will start 1/5th of that in parallel.

Sign up to request clarification or add additional context in comments.

1 Comment

I do agree that the percentage should be included. Interesting was that I just tried in a smaller scale: a python script using 2 cpus and a prallel command with 2 cpus on three servers respectively. The result was that each server only provide 2 cpus but not 2*2 cpus.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.