0

I have a file

     2001:778:0:1::21 - - [16/Sep/2011:12:30:46 +0300] "GET / HTTP/1.1" 200 44
        2001:778:0:1::21 - - [16/Sep/2011:12:30:46 +0300] "GET /favicon.ico HTTP/1.1" 2$
2001:778:0:1::21 - - [16/Sep/2011:12:30:46 +0300] "GET / HTTP/1.1" 200 44
        2001:778:0:1::21 - - [16/Sep/2011:12:32:15 +0300] "GET / HTTP/1.1" 200 66643
        88.222.10.7 - - [17/Sep/2011:23:39:25 +0300] "GET / HTTP/1.1" 200 66643
        88.222.10.1 - - [17/Sep/2011:23:39:25 +0300] "GET /favicon.ico HTTP/1.1" 200 14$
     88.222.10.1 - - [17/Sep/2011:23:39:25 +0300] "GET /favicon.ico HTTP/1.1" 200 14$
     88.222.10.1 - - [17/Sep/2011:23:39:25 +0300] "GET /favicon.ico HTTP/1.1" 200 14$
        88.222.10.7 - - [18/Sep/2011:13:45:39 +0300] "GET / HTTP/1.1" 304 -

And I need to count the duplicates IP addresses

 awk -F "- -" '{dups[$1]++} END{for (num in dups) {print num,dups[num]}}' myFile

So now I have

2001:778:0:1::21 4
88.222.10.7 2
88.222.10.1 3

And I want to sort everything So my result should be

  2001:778:0:1::21 4
    88.222.10.1 3
    88.222.10.7 2

But I don't know how to sort arrays? IS it possible to do that?

2
  • pipe to sort is an option Commented Apr 21, 2015 at 10:04
  • If gawk use asort. Commented Apr 21, 2015 at 10:07

1 Answer 1

2

This is most straightforward with GNU awk 4.0+, which has a mechanism for sorted array traversal:

awk '{dups[$1]++} END{ PROCINFO["sorted_in"] = "@val_num_desc"; for(num in dups) {print num,dups[num]}}' filename

That is:

{ dups[$1]++ }
END {
  PROCINFO["sorted_in"] = "@val_num_desc";  # <-- here: Array traversal in
                                            #     numerically descending order
                                            #     of values
  for(num in dups) {
    print num,dups[num]
  }
}

If GNU awk is not available, pipe through sort:

awk '{dups[$1]++} END{ for(num in dups) {print num,dups[num]}}' filename | sort -t ' ' -rgk 2

Note that I removed the custom field separator because it didn't seem necessary (and even harmful if the number of leading whitespace varied). If you want to keep it for some reason, you'll have to give sort the -b option in addition to -t ' ' -rgk 2 to ignore the leading whitespace in awk's output.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.