I have a worker.php file as below
<?php
$data = $argv[1];
//then some time consuming $data processing
and I run this as a poor man's job queue using gnu parallel
while read LINE; do echo $LINE; done < very_big_file_10GB.txt | parallel -u php worker.php
which kind of works by forking 4 php processes when I am on 4 cpu machine.
But it still feels pretty synchronous to me because read LINE is still reading one line at a time.
Since it is 10GB file, I am wondering if somehow I can use parallel to read the same file in parallel by splitting it into n parts (where n = number of my cpus), that will make my import n times faster (ideally).
parallel --pipeandparallel --pipe-part. Also, consider changingworker.phpto look at and loop over multiple parameters so that a single invocation can process many parameters/lines - the reason is you currently pay the price of starting an entire new PHP interpreter for every single line of your file! If you make yourworker.phpable to process lots of lines, you can useparallel -Xand get much better performance.