I put all the numbers in a hash, in case they are needed for later processing. But the hash as implemented is only used for counting. As far as optimization, the fastest solution will certainly be to keep a running total of the number with the largest frequency. I used an array in case there are many numbers with the same frequency. This way you dont have to spend time sorting all the numbers by frequency. Basically you get the answer in O(n) time instead of O(n log n), so always faster.
I have included the time, but it may be useful to look at the Shortcomings of Empirical Metrics section in the Analysis of Algorithms Wikipedia article. Big O analysis will always be better than Empirical Metrics for exactly these reasons.
My solution is in Perl. With C and Assembler, run time will surely be faster, but development time will be longer. The code will also be less direct, less concise, and more difficult to follow. Fumbling around with a strict rather than dynamic type system will always add to development time and frustration.
Here is the code...
#!/usr/bin/perl -w
my $appearanceCount = -1;
my @appearanceNumber;
my %count;
while(<>){
chomp;
$count{$_}++;
#keeping a running total will always be faster than sorting all hash values
if($appearanceCount < $count{$_}){
undef @appearanceNumber;
push(@appearanceNumber,$_);
$appearanceCount = $count{$_};
}elsif($appearanceCount == $count{$_}){
push(@appearanceNumber,$_);
}
if(eof){
print "$ARGV: ";
print "Numbers with biggest count, ordered by first appearance <$appearanceCount> @appearanceNumber\n";
#DEBUG print "$count{$_}: $_\n" for(sort{$count{$b} <=> $count{$a}} keys(%count)); #print all appearances in descending order
#reset variables for next file
undef %count;
undef @appearanceNumber;
$appearanceCount = -1;
}
}
Here is the output...
$ time perl biggest.pl biggest1.txt biggest2.txt biggest3.txt
biggest1.txt: Numbers with biggest count, sorted by first appearance <2> 208 188 641 546 374 694
biggest2.txt: Numbers with biggest count, sorted by first appearance <23> 284
biggest3.txt: Numbers with biggest count, sorted by first appearance <1130> 142
real 0m0.213s
user 0m0.209s
sys 0m0.004s
That is a metric for all three files at once sequentially, here is the biggest file individually...
$ time perl biggest.pl biggest3.txt
biggest3.txt: Numbers with biggest count, sorted by first appearance <1130> 142
real 0m0.203s
user 0m0.200s
sys 0m0.003s
That timing is on a ~6 year old laptop with a million tabs open watching youtube videos and hasnt been rebooted in 19 days. Not exactly a top of the line server. But I know for certain this is the fastest algorithm. Hooray for Big O Analysis!