2

I am using a python script to perform some calculations in my image and save the array obtained into a .png file. I deal with 3000 to 4000 images. To perform all these I use a shell script in Ubuntu. It gets the job done. But is there anyway to make it fast. I have 4 cores in my machine. How to use all of them. The script I am using is below

#!/bin/bash
cd $1
for i in $(ls *.png)
do
python ../tempcalc12.py $i
done
cd ..

tempcalc12.py is my python script

This question might be trivial. But I am really new to programming.

Thank you

1

3 Answers 3

3

xargs has --max-procs= ( or -P) option which does the job in parallel.
The following code does the job in maximum of 4 processes.

ls *.png |  xargs -n 1 -P 4 python ../tempcalc12.py
Sign up to request clarification or add additional context in comments.

Comments

1

You can just add a & to the python line to have everything executed in parallel:

python ../tempcalc12.py $i &

This is a bad idea though, as having too many processes will just slow everything down. What you can do is limit the number of threads, like this:

MAX_THREADS=4
for i in $(ls *.png); do
    python ../tempcalc12.py $i &
    while [ $( jobs | wc -l ) -ge "$MAX_THREADS" ]; do
        sleep 0.1
    done
done

Every 100ms, it will check the number of running jobs, and if it is inferior to MAX_THREADS, add new jobs in background.

This is a nice hack if you just want a quick working solution, but you might also want to investigate what GNU Parallel can do.

Comments

0

If you have GNU Parallel you can do:

parallel python ../tempcalc12.py ::: *.png

It will do The Right Thing by spawning a job per core, even if the names your PNGs have space, ', or " in them. It also makes sure the output from different jobs are not mixed together, so if you use the output you are guaranteed that you will not get half-a-line from two different jobs.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.