Skip to main content
edited body
Source Link
Romeo Ninov
  • 7.4k
  • 1
  • 29
  • 36

Here is one simple program in awk which do the work:

awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>ba[$0]>c) {b=$0;c=a[$0]}} END {print b,c} '

In "standard" approach I will count in array the occurrences of particular number, then sort and reveal the highest number. But sorting may be complex (in sense of combined memory and processor cycles). So I just check if current value if bigger that stored count and if yes replace it. So complexity of my code is (almost) linear :)

My machine:

Processor   Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz   3.70 GHz
Installed RAM   64,0 GB (63,7 GB usable

TIme to exec for 100 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f
208 2
real 0.03
user 0.00
sys 0.00

TIme to exec for 10000 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f1
284 23
real 0.04
user 0.01
sys 0.01

Time with 1M samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' 1M_random_numbers.txt
142 1130
real 0.89
user 0.85
sys 0.01

I learned a interesting way to count lines/tokens in awk

Here is one simple program in awk which do the work:

awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>b) {b=$0;c=a[$0]}} END {print b,c} '

In "standard" approach I will count in array the occurrences of particular number, then sort and reveal the highest number. But sorting may be complex (in sense of combined memory and processor cycles). So I just check if current value if bigger that stored count and if yes replace it. So complexity of my code is (almost) linear :)

My machine:

Processor   Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz   3.70 GHz
Installed RAM   64,0 GB (63,7 GB usable

TIme to exec for 100 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f
208 2
real 0.03
user 0.00
sys 0.00

TIme to exec for 10000 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f1
284 23
real 0.04
user 0.01
sys 0.01

Time with 1M samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' 1M_random_numbers.txt
142 1130
real 0.89
user 0.85
sys 0.01

I learned a interesting way to count lines/tokens in awk

Here is one simple program in awk which do the work:

awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} '

In "standard" approach I will count in array the occurrences of particular number, then sort and reveal the highest number. But sorting may be complex (in sense of combined memory and processor cycles). So I just check if current value if bigger that stored count and if yes replace it. So complexity of my code is (almost) linear :)

My machine:

Processor   Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz   3.70 GHz
Installed RAM   64,0 GB (63,7 GB usable

TIme to exec for 100 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f
208 2
real 0.03
user 0.00
sys 0.00

TIme to exec for 10000 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f1
284 23
real 0.04
user 0.01
sys 0.01

Time with 1M samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' 1M_random_numbers.txt
142 1130
real 0.89
user 0.85
sys 0.01

I learned a interesting way to count lines/tokens in awk

added 189 characters in body
Source Link
Romeo Ninov
  • 7.4k
  • 1
  • 29
  • 36

Here is one simple program in awk which do the work:

awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>b) {b=$0;c=a[$0]}} END {print b,c} '

In "standard" approach I will count in array the occurrences of particular number, then sort and reveal the highest number. But sorting may be complex (in sense of combined memory and processor cycles). So I just check if current value if bigger that stored count and if yes replace it. So complexity of my code is (almost) linear :)

My machine:

Processor   Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz   3.70 GHz
Installed RAM   64,0 GB (63,7 GB usable

TIme to exec for 100 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f
208 2
real 0.03
user 0.00
sys 0.00

TIme to exec for 10000 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f1
284 23
real 0.04
user 0.01
sys 0.01

Time with 1M samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' 1M_random_numbers.txt
142 1130
real 0.89
user 0.85
sys 0.01

I learned a interesting way to count lines/tokens in awk

Here is one simple program in awk which do the work:

awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>b) {b=$0;c=a[$0]}} END {print b,c} '

In "standard" approach I will count in array the occurrences of particular number, then sort and reveal the highest number. But sorting may be complex (in sense of combined memory and processor cycles). So I just check if current value if bigger that stored count and if yes replace it. So complexity of my code is (almost) linear :)

My machine:

Processor   Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz   3.70 GHz
Installed RAM   64,0 GB (63,7 GB usable

TIme to exec for 100 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f
208 2
real 0.03
user 0.00
sys 0.00

TIme to exec for 10000 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f1
284 23
real 0.04
user 0.01
sys 0.01

I learned a interesting way to count lines/tokens in awk

Here is one simple program in awk which do the work:

awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>b) {b=$0;c=a[$0]}} END {print b,c} '

In "standard" approach I will count in array the occurrences of particular number, then sort and reveal the highest number. But sorting may be complex (in sense of combined memory and processor cycles). So I just check if current value if bigger that stored count and if yes replace it. So complexity of my code is (almost) linear :)

My machine:

Processor   Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz   3.70 GHz
Installed RAM   64,0 GB (63,7 GB usable

TIme to exec for 100 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f
208 2
real 0.03
user 0.00
sys 0.00

TIme to exec for 10000 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f1
284 23
real 0.04
user 0.01
sys 0.01

Time with 1M samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' 1M_random_numbers.txt
142 1130
real 0.89
user 0.85
sys 0.01

I learned a interesting way to count lines/tokens in awk

Source Link
Romeo Ninov
  • 7.4k
  • 1
  • 29
  • 36

Here is one simple program in awk which do the work:

awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>b) {b=$0;c=a[$0]}} END {print b,c} '

In "standard" approach I will count in array the occurrences of particular number, then sort and reveal the highest number. But sorting may be complex (in sense of combined memory and processor cycles). So I just check if current value if bigger that stored count and if yes replace it. So complexity of my code is (almost) linear :)

My machine:

Processor   Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz   3.70 GHz
Installed RAM   64,0 GB (63,7 GB usable

TIme to exec for 100 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f
208 2
real 0.03
user 0.00
sys 0.00

TIme to exec for 10000 samples:

# /usr/bin/time -p awk 'BEGIN {b=0;c=0} {a[$0]+=1; if(a[$0]>c) {b=$0;c=a[$0]}} END {print b,c} ' f1
284 23
real 0.04
user 0.01
sys 0.01

I learned a interesting way to count lines/tokens in awk