101

I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within this sorted list. I need to find the count for each unique number in that list.

For e.g. if the output of cmd1 is:

100 
100 
100 
99 
99 
26 
25 
24 
24

I need another command that I can pipe the above output to, so that, I get:

100     3
99      2
26      1
25      1
24      2
2

7 Answers 7

119

how about;

$ echo "100 100 100 99 99 26 25 24 24" \
    | tr " " "\n" \
    | sort \
    | uniq -c \
    | sort -k2nr \
    | awk '{printf("%s\t%s\n",$2,$1)}END{print}'

The result is :

100 3
99  2
26  1
25  1
24  2
Sign up to request clarification or add additional context in comments.

5 Comments

I ran this and it produced an extra print statement of $1,$2 at the end: 100 3 99 2 26 1 25 1 24 2 2 24
The following adds a new line between the results and removes the extra line at the end: echo "100 100 100 99 99 26 25 24 24" | tr " " "\n" | sort | uniq -c | sort -k2nr | awk '{printf("%s\t%s\n",$2,$1)}END{print}' | head -n -1 so you get: 100 3 99 2 26 1 25 1 24 2
Note about syntax, you can end a line with a pipe instead of using a backslash.
Note about note regarding syntax: that's absolutely true and "cleaner", however in shell scripts (and SQL fwiw), I've evolved a personal preference (after trying both) preferring "pipe first", as (for me) it's visually easier to see "hey, this is a pipe." (Likewise in SQL blocks, I put my semi-colon line terminators at the beginning of the subsequent line.)
there's currently an extraneous END{print} that prints out an extra copy of the last line of input. To print a blank line, it should have been END{print "\n"} or just get rid of it altogether, e.g., cmd1 | sort | uniq -c | sort -k2nr | awk '{printf("%s\t%s\n",$2,$1)}' or for the awk part even just awk 'OFS="\t" { print $2, $1 }'
71

uniq -c works for GNU uniq 8.23 at least, and does exactly what you want (assuming sorted input).

2 Comments

in case if the input is not sorted, then just add sort command: sort file_name | uniq -c
Awesome. Works on Mac OS X as well! Tested on Mojave 10.14.6.
11

if order is not important

# echo "100 100 100 99 99 26 25 24 24" | awk '{for(i=1;i<=NF;i++)a[$i]++}END{for(o in a) printf "%s %s ",o,a[o]}'
26 1 100 3 99 2 24 2 25 1

1 Comment

+1 for doing this with 3 less pipes. It would be awesome if you could elaborate on how this works b/c it confused me. ;-) Thanks.
11

Numerically sort the numbers in reverse, then count the duplicates, then swap the left and the right words. Align into columns.

printf '%d\n' 100 99 26 25 100 24 100 24 99 \
   | sort -nr | uniq -c | awk '{printf "%-8s%s\n", $2, $1}'
100     3
99      2
26      1
25      1
24      2

Comments

2

In Bash, we can use an associative array to count instances of each input value. Assuming we have the command $cmd1, e.g.

#!/bin/bash

cmd1='printf %d\n 100 99 26 25 100 24 100 24 99'

Then we can count values in the array variable a using the ++ mathematical operator on the relevant array entries:

while read i
do
    ((++a["$i"]))
done < <($cmd1)

We can print the resulting values:

for i in "${!a[@]}"
do
    echo "$i ${a[$i]}"
done

If the order of output is important, we might need an external sort of the keys:

for i in $(printf '%s\n' "${!a[@]}" | sort -nr)
do
    echo "$i ${a[$i]}"
done

Comments

0

In case you have input stored in my_file you can do:

sort -nr my_file | uniq -c | awk ' { t = $1; $1 = $2; $2 = t; print; } '

Otherwise just pipe the input to be processed to the same cmd.

Explanation:

  • sort -nr sorts the input numerically (-n) in reverse order (-r)
  • uniq -c count duplicates and shows the count side-by-side
  • awk '{ t = $1; $1 = $2; $2 = t; print; }' swaps the two columns

Comments

0

Ruby internally has tools to do this very efficiently from the command line.

Example, given this file:

$ cat file
100 
100 
100 
99 
99 
26 
25 
24 
24
1
  1. Count each;
  2. Sort by a) decreasing occurrence b) decreasing value;
  3. Put in lined up columns.

This Ruby does that:

ruby  -e '
cnt=Hash.new(0)
$<.each{|x| cnt[x.to_i]+=1}
w1,w2=cnt.max_by{|e| e.to_s.length}.map{|e| e.to_s.length+2}
cnt.sort_by{|k,v| [-v,-k]}.each{|k,v| 
            puts "#{k.to_s.rjust(w1," ")}\t#{v.to_s.rjust(w2," ")}"
}' file

Prints:

  100     3
   99     2
   24     2
   26     1
   25     1
    1     1

The input file does not need to be sorted.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.