2

I tried this (to observe the behaviour of unix sort):

yes | sort & top

What I see is the unix memory usage growing, as you would expect, but the sort process itself's memory does not appear to be growing:

Mem:   1689540k total,  1455384k used,   234156k free,   147248k buffers
Swap:  1718268k total,      804k used,  1717464k free,   956216k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
32248 mgregory  20   0 29844  25m  692 R 95.0  1.6   0:32.98 sort               
32247 mgregory  20   0  4036  504  444 S  4.0  0.0   0:01.52 yes             

The number 1455348 is growing rapidly

The number 29844 is not growing.

What is happening there?

1
  • 1
    Sort does not use all of system memory by default, it makes temp files and then merges them. You need to tell it to use more memory if you want it to be faster. Commented Jan 7, 2014 at 23:56

2 Answers 2

3

Sort doesn't need to have all data in memory, necessarily.

  1. Sort is able to do merge sort if files are too big to fit in memory. I think (IIRC) some of this is described in the man/info pages. Edit e.g.:

    --batch-size=NMERGE
          merge at most NMERGE inputs at once; for more use temp files
    -S, --buffer-size=SIZE
          use SIZE for main memory buffer
    
  2. The 1455384k number is likely growing if

    • sort mmaps in more pages than are actually 'reserved' (i.e. locked into the process address space)

    • buffers are counted (as files and data are read, dentries, blocks and inodes are cached). Check this by doing (as root)

      echo 3 > /proc/sys/vm/drop_caches 
      

    and seeing how much memory becomes available again.

Sign up to request clarification or add additional context in comments.

Comments

1

Unix Sort uses an External R-Way merge sorting algorithm. It basically divides the input up into smaller portions of similar size (that fit into memory) and then merges each portion together at the end.

Those small portions of the file, except during its sorting precess, are stored in temporary disk files (usually in /tmp) and not in memory. Therefore the Unix Sort command's memory usage does not increase during the sorting process.

But why is the unix memory usage growing ? Simply because "unused memory is wasted memory". The Linux kernel keeps around huge amounts of file metadata and files that were requested, until something that looks more important pushes that data out.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.