PHP exec very slow processing simple 3-pipe grep

Question

I've read here and cannot really understand how to speed up my simple exec() which basically looks like this:

 zcat access_log.201312011745.gz | grep 'id=6' | grep 'id2=10' | head -n10

I've added ini_set('memory_limit', 256); to the top of the PHP document, but the script still takes about 1 minute to run (contrasted with about near instant completion in Penguinet). What can I do to improve it?

How big is your file? Note that doing zcat and then piping, a lot of memory is used to allocate the file. — fedorqui
– fedorqui, Commented Dec 4, 2013 at 15:58
@fedorqui file is 11 megabytes. How would you recommend searching it? — 1252748
– 1252748, Commented Dec 4, 2013 at 19:08
php's memory limit does NOT apply to external programs you're running via exec(). maybe it does take long to find 10 lines that have id2=10 buried within all the output of the lines that contain id=6 amongst ALL of the lines in that log file. — Marc B
– Marc B, Commented Dec 4, 2013 at 19:31
@MarcB Why does it take so little time to do the same search from the command line then? How can I replicate this speed? — 1252748
– 1252748, Commented Dec 4, 2013 at 19:34
How about unzipping the file beforehand, and then just using "grep 'id=6' file.notzipped | grep..." That will take "zcat" out of the equation altogether and may make it easier to solve. — Mark Setchell
– Mark Setchell, Commented Dec 4, 2013 at 20:06

Mark Setchell · Accepted Answer · 2013-12-04 20:17:40Z

0

I would try some of the following:

Change your exec to just run somethig simple, like

echo Hello

and see if it still takes so long - if it does, the problem is in the process creation and exec()ing area.

If that runs quickly, try changing the exec to something like:

zcat access_log.201312011745.gz > /dev/null

to see if it is the "zcat" slowing you down

Think about replacing the greps with a "sed" that quits (using "q") as soon as it finds what you are looking for rather than continuing all the way to end of file - since it seems (by your "head") you are only interested in the first few, not all occurrences of your strings. For example, you seem to be looking for lines that contain "id=6" and also "id2=10", so if you used "sed" like below, it may be faster because "sed" will print it and stop immediately the moment it finds a line with "id=6" followed by "id2=10"

zcat access_log.201312011745.gz | sed -n '/id=2.*id2=10/p;q'

The "-n" says "don't print, in general" and then it looks for "id=2" followed by any characters then "id2=10". If it finds that, it prints the line and the "q" makes it quit immediately without looking through to end of file. Note that I am assuming "id=2" comes before "id2=10" on the line. If that is not true, the "sed" will need additional work.

edited Dec 4, 2013 at 20:17

answered Dec 4, 2013 at 16:03

Mark Setchell

210k32 gold badges310 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

1252748 Over a year ago

great. thank you. can i just word for word replace grep with sed?

1252748 Over a year ago

also, i don't understand how to write this exactly zcat access_log.... > /dev/null in order to test.

Mark Setchell Over a year ago

I have edited my original post to clarify what I was trying to say.

1252748 Over a year ago

Thanks! But where in this command do I specify the file name? Also, why do I want to tell it "not to print". I want to print the results. If I want to print ten results, how do I specify that? Thanks again

Mark Setchell Over a year ago

I have edited the command to show how to use it. The "p" at the end says "print only if the pattern is matched".

Collectives™ on Stack Overflow

PHP exec very slow processing simple 3-pipe grep

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related