GNU Parallel as job queue processor

Question

I have a worker.php file as below

<?php

$data = $argv[1];

//then some time consuming $data processing

and I run this as a poor man's job queue using gnu parallel

while read LINE; do echo $LINE; done < very_big_file_10GB.txt  | parallel -u php worker.php

which kind of works by forking 4 php processes when I am on 4 cpu machine.

But it still feels pretty synchronous to me because read LINE is still reading one line at a time.

Since it is 10GB file, I am wondering if somehow I can use parallel to read the same file in parallel by splitting it into n parts (where n = number of my cpus), that will make my import n times faster (ideally).

Look at parallel --pipe and parallel --pipe-part. Also, consider changing worker.php to look at and loop over multiple parameters so that a single invocation can process many parameters/lines - the reason is you currently pay the price of starting an entire new PHP interpreter for every single line of your file! If you make your worker.php able to process lots of lines, you can use parallel -X and get much better performance. — Mark Setchell
– Mark Setchell, Commented Oct 23, 2018 at 9:19

Ole Tange · Accepted Answer · 2018-10-24 07:29:19Z

2

No need to do the while business:

parallel -u php worker.php :::: very_big_file_10GB.txt

-u Ungroup output. Only use this if you are not going to use the output, as output from different jobs may mix.

:::: File input source. Equivalent to -a.

I think you will benefit from reading at least chapter 2 (Learn GNU Parallel in 15 minutes) of "GNU Parallel 2018". You can buy it at http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html or download it at: https://doi.org/10.5281/zenodo.1146014

edited Oct 24, 2018 at 7:29

answered Oct 23, 2018 at 19:35

Ole Tange

34.1k9 gold badges93 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Waku-2 Over a year ago

super, that did it like magic, can you please add little explanation for the novice, specially the :::: & the -u part?

Collectives™ on Stack Overflow

GNU Parallel as job queue processor

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related