3

I am writing a custom apache log parser for my company and I noticed a performance issue that I can't explain. I have a text file log.txt with size 1.2GB.

The command: sort log.txt is up to 3 sec slower than the command: cat log.txt | sort

Does anybody know why this is happening?

4
  • Is one call taking longer than the other because the log file has grown in the interim? Commented Jun 3, 2011 at 10:43
  • No. I work on a local copy of the file. Commented Jun 3, 2011 at 10:50
  • 1
    Is it because on the second try (the cat|sort) log.txt was read from cache, not disk, so you saved on disk access, not actual sorting? please run your sort through strace -C and see how much time goes toward disk access. Or just put the file on a ramdisk and run your experiments there. Commented Jun 3, 2011 at 13:09
  • If it's 1.2 GB it might be worth it to try this: LC_ALL=C sort log.txt LC_COLLATE=C might also help. Commented Aug 9, 2011 at 19:25

2 Answers 2

4

cat file | sort is a Useless Use of Cat.

The purpose of cat is to concatenate (or "catenate") files. If it's only one file, concatenating it with nothing at all is a waste of time, and costs you a process.

It shouldn't take longer. Are you sure your timings are right?

Please post the output of:

time sort file

and

time cat file | sort

You need to run the commands a few times and get the average.

Sign up to request clarification or add additional context in comments.

2 Comments

You are absolutely right! Thanx! I was only checking a single output of the time command. When ran 5 times and taking the average the version without cat seems to be approximately 2 secs faster! Thanx a lot for the link also!
@user744734: before taking the average, you should discard outlier values, in this case, at least the run that warms the cache.
1

Instead of worrying about the performance of sort instead you should change your logging:

  • Eliminate unnecessarily verbose output to your log.
  • Periodically roll the log (based on either date or size).
  • ...fix the errors outputting to the log. ;)

Also, are you sure cat is reading the entire file? It may have a read buffer etc.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.