Simpler way to count the number of duplicated rows in a text file

Question

I have a text file that looks like this:

abc
bcd
abc
efg
bcd
abc

And the expected output is this:

3 abc 
2 bcd
1 efg

I know there is an existed solution for this:

sort -k2 < inFile |
awk '!z[$1]++{a[$1]=$0;} END {for (i in a) print z[i], a[i]}' |
sort -rn -k1 > outFile

The code sorts, removes duplicates, and sorts again, and prints the expected output. However, is there a simpler way to express the z[$1]++{a[$1]=$0} part? More "basic", I mean.

why is the expected output have 2 abc when there are 3 occurrences of abc? — ptierno
– ptierno, Commented Apr 20, 2015 at 20:51

John1024 · Accepted Answer · 2015-04-20 21:01:25Z

3

More basic:

$ sort inFile | uniq -c
      3 abc
      2 bcd
      1 efg

More basic awk

When one is used to awk's idioms, the expression !z[$1]++{a[$1]=$0;} is clear and concise. For those used to programming in other languages, other forms might be more familiar, such as:

awk '{if (z[$1]++ == 0) a[$1]=$0;} END {for (i in a) print z[i], a[i]}'

Or,

awk '{if (z[$1] == 0) a[$1]=$0; z[$1]+=1} END {for (i in a) print z[i], a[i]}'

edited Apr 20, 2015 at 21:01

answered Apr 20, 2015 at 20:53

John1024

115k15 gold badges152 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ptierno Over a year ago

not his expected output, although this is exactly what i was thinking.

John1024 Over a year ago

@ptierno Yes. Since the OP's awk statement produces a count of 3 for abc, I assumed that the 2 was a typo.

PTN Over a year ago

Ah yes I made a typo there. It should have been 3 abc.

tommy.carstensen · Accepted Answer · 2015-04-20 21:26:15Z

0

If your input file contains billions of lines and you want to avoid sort, then you can just do:

awk '{a[$0]++} END{for(x in a) print a[x],x}' file.txt

answered Apr 20, 2015 at 21:26

tommy.carstensen

9,66215 gold badges70 silver badges112 bronze badges

Collectives™ on Stack Overflow

Simpler way to count the number of duplicated rows in a text file

2 Answers 2

More basic awk

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

More basic awk

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related