Here is one simple program in awk which do the work:
awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>ba[$0]>c) {b=$0;c=a[$0]}} END {print b,c} '
In "standard" approach I will count in array the occurrences of particular number, then sort and reveal the highest number. But sorting may be complex (in sense of combined memory and processor cycles). So I just check if current value if bigger that stored count and if yes replace it. So complexity of my code is (almost) linear :)
My machine:
Processor Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz 3.70 GHz
Installed RAM 64,0 GB (63,7 GB usable
TIme to exec for 100 samples:
# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f
208 2
real 0.03
user 0.00
sys 0.00
TIme to exec for 10000 samples:
# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f1
284 23
real 0.04
user 0.01
sys 0.01
Time with 1M samples:
# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' 1M_random_numbers.txt
142 1130
real 0.89
user 0.85
sys 0.01
I learned a interesting way to count lines/tokens in awk