0

I've read here and cannot really understand how to speed up my simple exec() which basically looks like this:

 zcat access_log.201312011745.gz | grep 'id=6' | grep 'id2=10' | head -n10 

I've added ini_set('memory_limit', 256); to the top of the PHP document, but the script still takes about 1 minute to run (contrasted with about near instant completion in Penguinet). What can I do to improve it?

7
  • 1
    How big is your file? Note that doing zcat and then piping, a lot of memory is used to allocate the file. Commented Dec 4, 2013 at 15:58
  • @fedorqui file is 11 megabytes. How would you recommend searching it? Commented Dec 4, 2013 at 19:08
  • php's memory limit does NOT apply to external programs you're running via exec(). maybe it does take long to find 10 lines that have id2=10 buried within all the output of the lines that contain id=6 amongst ALL of the lines in that log file. Commented Dec 4, 2013 at 19:31
  • @MarcB Why does it take so little time to do the same search from the command line then? How can I replicate this speed? Commented Dec 4, 2013 at 19:34
  • 1
    How about unzipping the file beforehand, and then just using "grep 'id=6' file.notzipped | grep..." That will take "zcat" out of the equation altogether and may make it easier to solve. Commented Dec 4, 2013 at 20:06

1 Answer 1

0

I would try some of the following:

Change your exec to just run somethig simple, like

echo Hello

and see if it still takes so long - if it does, the problem is in the process creation and exec()ing area.

If that runs quickly, try changing the exec to something like:

zcat access_log.201312011745.gz > /dev/null

to see if it is the "zcat" slowing you down

Think about replacing the greps with a "sed" that quits (using "q") as soon as it finds what you are looking for rather than continuing all the way to end of file - since it seems (by your "head") you are only interested in the first few, not all occurrences of your strings. For example, you seem to be looking for lines that contain "id=6" and also "id2=10", so if you used "sed" like below, it may be faster because "sed" will print it and stop immediately the moment it finds a line with "id=6" followed by "id2=10"

zcat access_log.201312011745.gz | sed -n '/id=2.*id2=10/p;q'

The "-n" says "don't print, in general" and then it looks for "id=2" followed by any characters then "id2=10". If it finds that, it prints the line and the "q" makes it quit immediately without looking through to end of file. Note that I am assuming "id=2" comes before "id2=10" on the line. If that is not true, the "sed" will need additional work.

Sign up to request clarification or add additional context in comments.

5 Comments

great. thank you. can i just word for word replace grep with sed?
also, i don't understand how to write this exactly zcat access_log.... > /dev/null in order to test.
I have edited my original post to clarify what I was trying to say.
Thanks! But where in this command do I specify the file name? Also, why do I want to tell it "not to print". I want to print the results. If I want to print ten results, how do I specify that? Thanks again
I have edited the command to show how to use it. The "p" at the end says "print only if the pattern is matched".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.