counting duplicates in a sorted sequence using command line tools

Question

I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within this sorted list. I need to find the count for each unique number in that list.

For e.g. if the output of cmd1 is:

I need another command that I can pipe the above output to, so that, I get:

related: serverfault.com/questions/37020/…

David Cary
– David Cary

2012-06-24 17:24:48 +00:00
Commented Jun 24, 2012 at 17:24 — David Cary
– David Cary, Commented Jun 24, 2012 at 17:24
related: stackoverflow.com/a/16980265/32453

rogerdpack
– rogerdpack

2013-10-29 16:29:26 +00:00
Commented Oct 29, 2013 at 16:29 — rogerdpack
– rogerdpack, Commented Oct 29, 2013 at 16:29

diguage · Accepted Answer · 2019-01-20 11:02:43Z

119

how about;

$ echo "100 100 100 99 99 26 25 24 24" \
    | tr " " "\n" \
    | sort \
    | uniq -c \
    | sort -k2nr \
    | awk '{printf("%s\t%s\n",$2,$1)}END{print}'

The result is :

edited Jan 20, 2019 at 11:02

diguage

3971 gold badge4 silver badges19 bronze badges

answered Jul 7, 2009 at 13:54

Stephen Paul Lesniewski

1,4011 gold badge10 silver badges3 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Mittenchops Over a year ago

I ran this and it produced an extra print statement of $1,$2 at the end: 100 3 99 2 26 1 25 1 24 2 2 24

Woody Over a year ago

The following adds a new line between the results and removes the extra line at the end:

echo "100 100 100 99 99 26 25 24 24" | tr " " "\n" | sort | uniq -c | sort -k2nr | awk '{printf("%s\t%s\n",$2,$1)}END{print}' | head -n -1

so you get: 100 3 99 2 26 1 25 1 24 2

wjandrea Over a year ago

Note about syntax, you can end a line with a pipe instead of using a backslash.

michael Over a year ago

Note about note regarding syntax: that's absolutely true and "cleaner", however in shell scripts (and SQL fwiw), I've evolved a personal preference (after trying both) preferring "pipe first", as (for me) it's visually easier to see "hey, this is a pipe." (Likewise in SQL blocks, I put my semi-colon line terminators at the beginning of the subsequent line.)

michael Over a year ago

there's currently an extraneous END{print} that prints out an extra copy of the last line of input. To print a blank line, it should have been END{print "\n"} or just get rid of it altogether, e.g., cmd1 | sort | uniq -c | sort -k2nr | awk '{printf("%s\t%s\n",$2,$1)}' or for the awk part even just awk 'OFS="\t" { print $2, $1 }'

Ibrahim · Accepted Answer · 2016-02-29 08:47:19Z

71

uniq -c works for GNU uniq 8.23 at least, and does exactly what you want (assuming sorted input).

answered Feb 29, 2016 at 8:47

Ibrahim

1,92319 silver badges27 bronze badges

2 Comments

Mikhail Geyer Over a year ago

in case if the input is not sorted, then just add sort command: sort file_name | uniq -c

bappak Over a year ago

Awesome. Works on Mac OS X as well! Tested on Mojave 10.14.6.

ghostdog74 · Accepted Answer · 2009-07-07 13:44:03Z

11

if order is not important

# echo "100 100 100 99 99 26 25 24 24" | awk '{for(i=1;i<=NF;i++)a[$i]++}END{for(o in a) printf "%s %s ",o,a[o]}'
26 1 100 3 99 2 24 2 25 1

answered Jul 7, 2009 at 13:44

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

1 Comment

SaxDaddy Over a year ago

+1 for doing this with 3 less pipes. It would be awesome if you could elaborate on how this works b/c it confused me. ;-) Thanks.

ericcurtin · Accepted Answer · 2019-11-19 11:59:50Z

11

Numerically sort the numbers in reverse, then count the duplicates, then swap the left and the right words. Align into columns.

printf '%d\n' 100 99 26 25 100 24 100 24 99 \
   | sort -nr | uniq -c | awk '{printf "%-8s%s\n", $2, $1}'

edited Nov 19, 2019 at 11:59

answered Oct 17, 2017 at 10:25

ericcurtin

1,76719 silver badges24 bronze badges

Comments

Toby Speight · Accepted Answer · 2017-10-17 14:23:22Z

2

In Bash, we can use an associative array to count instances of each input value. Assuming we have the command $cmd1, e.g.

#!/bin/bash

cmd1='printf %d\n 100 99 26 25 100 24 100 24 99'

Then we can count values in the array variable a using the ++ mathematical operator on the relevant array entries:

while read i
do
    ((++a["$i"]))
done < <($cmd1)

We can print the resulting values:

for i in "${!a[@]}"
do
    echo "$i ${a[$i]}"
done

If the order of output is important, we might need an external sort of the keys:

for i in $(printf '%s\n' "${!a[@]}" | sort -nr)
do
    echo "$i ${a[$i]}"
done

answered Oct 17, 2017 at 14:23

Toby Speight

32.3k58 gold badges83 silver badges118 bronze badges

Comments

rkachach · Accepted Answer · 2022-08-10 11:04:55Z

0

In case you have input stored in my_file you can do:

sort -nr my_file | uniq -c | awk ' { t = $1; $1 = $2; $2 = t; print; } '

Otherwise just pipe the input to be processed to the same cmd.

Explanation:

sort -nr sorts the input numerically (-n) in reverse order (-r)
uniq -c count duplicates and shows the count side-by-side
awk '{ t = $1; $1 = $2; $2 = t; print; }' swaps the two columns

edited Aug 10, 2022 at 11:04

answered Aug 10, 2022 at 10:58

rkachach

17.5k8 gold badges49 silver badges69 bronze badges

Comments

dawg · Accepted Answer · 2023-07-12 17:36:39Z

0

Ruby internally has tools to do this very efficiently from the command line.

Example, given this file:

Count each;
Sort by a) decreasing occurrence b) decreasing value;
Put in lined up columns.

This Ruby does that:

ruby  -e '
cnt=Hash.new(0)
$<.each{|x| cnt[x.to_i]+=1}
w1,w2=cnt.max_by{|e| e.to_s.length}.map{|e| e.to_s.length+2}
cnt.sort_by{|k,v| [-v,-k]}.each{|k,v| 
            puts "#{k.to_s.rjust(w1," ")}\t#{v.to_s.rjust(w2," ")}"
}' file

Prints:

The input file does not need to be sorted.

answered Jul 12, 2023 at 17:36

dawg

105k24 gold badges142 silver badges217 bronze badges

Collectives™ on Stack Overflow

counting duplicates in a sorted sequence using command line tools

7 Answers 7

5 Comments

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

5 Comments

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related