1

I need to sort this file in descending order avoiding duplicates

Bob 5 404
Mike 3 404
Bob 19 404
Bob 78 404
Mike 93 404
Joe 7 404

So my result should be

Bob 102
Mike 96
Joe 7

What I have now is this

awk '{if($3 == 404) arr[$1]+=$2}END{for(i in arr)print i, arr[i]}' file

I know that there are sort -d but how I need to use it in awk?

UPDATE

awk 'BEGIN{FS=" "}{if($9 == 404) arr[$1]+=1}END{for(i in arr) print arr[i] | sort -k2nr }' input > output

I get this result

sh: 0:  not found

And my output file is now empty.

7
  • are the unique keys relatively 'finite'? Commented Apr 23, 2015 at 16:59
  • if you use gawk you have access to the asort() function. Commented Apr 23, 2015 at 17:05
  • That's not just sorting. You are aggregating the records with duplicate keys, not avoiding them. Commented Apr 23, 2015 at 17:16
  • 2
    The BEGIN{FS=" "} is not necessary. You need to replace the +=1 with +=$2. Pipe the output to sort -k2nr to sort in reverse numeric order. Commented Apr 23, 2015 at 17:22
  • @JonathanLeffler: Good advice, but to ensure that sorting occurs only by the 2nd column (though it doesn't make a difference in this case), it should be -k2,2nr. Commented Apr 23, 2015 at 18:06

2 Answers 2

3

Reuben L.'s answer contains the right pointers, but doesn't spell out the full solutions:


The POSIX-compliant solution spelled out:

You need to pipe the output from awk to the sort utility, outside of awk:

awk '{ if($3 == 404) arr[$1]+=$2 } END{ for (i in arr) print i, arr[i] }' input |
  sort -rn -k2,2 > output

Note the specifics of the sort command:

  • -r performs reverse sorting
  • -n performs numeric sorting
  • -k2,2 sorts by the 2nd whitespace-separated field only
    • by contrast, only specifying -k2 would sort starting from the 2nd field through the remainder of the line - doesn't make a difference here, since the 2nd field is the last field, but it's an important distinction in general.

Note that there's really no benefit to using the nonstandard -V option to get numeric sorting, as -n will do just fine; -V's true purpose is to perform version-number sorting.

Note that you could include the sort command inside your awk script - for(i in arr)print i, arr[i] | "sort -nr -k2,2" - note the " around the sort command - but there's little benefit to doing so.


The GNU awk asort() solution spelled out:

gawk '
  { if ($3 == 404) arr[$1]+=$2 } # build array
  END{
    for (k in arr) { amap[arr[k]] = k }   # create value-to-key(!) map
    asort(arr, asorted, "@val_num_desc")  # sort values numerically, in descending order
    # print in sort order
    for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i]
  }
' input > output

As you can see, this complicates the solution, because 2 extra arrays must be created:

  • for (k in arr) { amap[arr[k]] = k } creates the "inverse" of the original array in amap: it uses the values of the original array as keys and the corresponding keys as the values.
  • asort(arr, asorted, "@val_num_desc") then sorts the original array by its values in descending, numerical order ("@val_num_desc") and stores the result in new array asorted.
    • Note that the original keys are lost in the process: asorted keys are now numerical indices reflecting the sort order.
  • for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i] then enumerates asorted by sequential numerical index, which yields the desired sort order; amap[asorted[i]] returns the matching key (e.g., Bob) from the original array for the value at hand.
Sign up to request clarification or add additional context in comments.

Comments

0

Two possible solutions:

  1. Use gawk and the built-in asort() and asorti() functions

  2. Pipe the output of your awk command to sort -k2 -Vr. This will sort descending by the second column.

note: the -V flag is non-standard and is available for GNU sort. credits to Jonathan Leffler

3 Comments

It would need to be -k2n for numeric sorting; otherwise, 9 will appear before 89.
Oh; hmm...I suppose so. It's not a standard option -- you should point out that it will only work with GNU sort (it won't work with BSD/Mac OS X sort, for example).
@JonathanLeffler: Curiously, the OSX sort utility is GNU sort - it's just too old to support -V (as of OSX 10.10, it's version 5.93(!), whereas the version on Ubuntu 14.04, for instance, is 8.21). By contrast, the true BSD version of sort does implement -V, as of at least 2.3 (e.g., on FreeBSD 10).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.