1

I am trying to run a single R file multiple times at the same time with different arguments. GNU Parallel is not an option for me because I am running this on a server where I'm not authorized to install Parallel. So I chose the bash command &.

command1 & command2 & command3 ..... command30

However, when I run this, it is not behaving as expected. In the commands I am saving the output of each command to a new file and I notice that some of the files are empty. Actually most of them. So I'm guessing by only writing the above, some of the processes are being killed.

However, when I give

command1 &
command2 &
command3 &
command4 &

wait &

command5 &
command6 &
command7 &
command8 &

wait 

.
.
.

It works fine, but the problem is it takes almost 5 time more time than running only command1 because it waits till the previous commands are finished. Time is very important factor here and I want all the commands to run in the same time (almost) as it would take for just one command.

  • Why is this happening that without wait it "crashes"?
  • Is there any way I can improve the time so that all the commands can run at the time taken by only one command?
  • Is there any way I can implement this without using wait?

(Technically I know & doesn't make processes run in parallel but it creates a different processes for every command).

Thanks in advance!

My original code:

Rscript svmRScript_v2.r 0.05 1 > output/out0.1-10.txt &
Rscript svmRScript_v2.r 0.05 2 > output/out0.05-2.txt &
Rscript svmRScript_v2.r 0.05 5 > output/out0.05-5.txt &
Rscript svmRScript_v2.r 0.05 10 > output/out0.05-10.txt &

wait &

Rscript svmRScript_v2.r 0.05 50 > output/out0.05-50.txt &
Rscript svmRScript_v2.r 0.05 100 > output/out0.05-100.txt &
Rscript svmRScript_v2.r 0.05 500 > output/out0.05-500.txt &
Rscript svmRScript_v2.r 0.01 1 > output/out0.1-10.txt &

wait &

Rscript svmRScript_v2.r 0.01 2 > output/out0.01-2.txt &
Rscript svmRScript_v2.r 0.01 5 > output/out0.01-5.txt &
Rscript svmRScript_v2.r 0.01 10 > output/out0.01-10.txt &
Rscript svmRScript_v2.r 0.01 50 > output/out0.01-50.txt &

wait &

Rscript svmRScript_v2.r 0.01 100 > output/out0.01-100.txt &
Rscript svmRScript_v2.r 0.01 500 > output/out0.01-500.txt &
Rscript svmRScript_v2.r 0.005 1 > output/out0.1-10.txt &
Rscript svmRScript_v2.r 0.005 2 > output/out0.005-2.txt
4
  • & is not a command. Technically, it's a command separator; but you could also view it as a postfix operator. Commented Aug 18, 2015 at 10:42
  • The argument to wait should be a process ID. It is extremely unlikely that any if your processes would obtain the PID 5. Commented Aug 18, 2015 at 10:43
  • @shellter yes I did, but xargs processes commands one after the other, that's what happened with me. Commented Aug 18, 2015 at 10:49
  • 1
    Are you aware that if you are authorized to write your own perl scripts, then you are authorized to run GNU Parallel? oletange.blogspot.dk/2013/04/why-not-install-gnu-parallel.html Commented Aug 18, 2015 at 19:34

1 Answer 1

1

The files have not been written out yet when you examine them. With output redirection, the shell creates the output file when you start the job, but output buffering will typically leave the file empty or at least incomplete until the process has finished. Waiting forces you to delay until the job is actually done.

Unless the individual jobs are mostly waiting on some external resource, it is quite unreasonable to expect parallel processing to execute two jobs in the same time that you can execute one. More work takes more time.

If you start too many background processes, you will actually complete the jobs slower, because task switching takes a bigger fraction of the available processing power. Experiment with just a few, especially if they are heavily CPU bound. Batches of five jobs is probably a reasonable starting point, but the choke point will obviously depend entirely on what these scripts are actually doing.

Sign up to request clarification or add additional context in comments.

4 Comments

Is there any way I could implement this without using wait? I mean, I am using wait 5 just randomly, which is not very logical.
One big wait (with no arguments) at the end will properly wait until all the background processes have finished. But as indicated above, running a few processes at a time is probably more efficient than launching them all at once, only to have them battle each other for limited system resources.
I tried putting wait in the end and it works. But it takes almost X3 the time it took for one command. The thing is I am preparing for a command which will take 2 hours or more to execute. That's why I need parallel processes to be completed in the same time.
If your CPU is not at 100% you still have something to fix. If it is, you are getting your full money's worth, and need to buy a bigger computer to make it go faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.