2

I have a text file that looks like this:

abc
bcd
abc
efg
bcd
abc

And the expected output is this:

3 abc 
2 bcd
1 efg

I know there is an existed solution for this:

sort -k2 < inFile |
awk '!z[$1]++{a[$1]=$0;} END {for (i in a) print z[i], a[i]}' |
sort -rn -k1 > outFile 

The code sorts, removes duplicates, and sorts again, and prints the expected output. However, is there a simpler way to express the z[$1]++{a[$1]=$0} part? More "basic", I mean.

1
  • 3
    why is the expected output have 2 abc when there are 3 occurrences of abc? Commented Apr 20, 2015 at 20:51

2 Answers 2

3

More basic:

$ sort inFile | uniq -c
      3 abc
      2 bcd
      1 efg

More basic awk

When one is used to awk's idioms, the expression !z[$1]++{a[$1]=$0;} is clear and concise. For those used to programming in other languages, other forms might be more familiar, such as:

awk '{if (z[$1]++ == 0) a[$1]=$0;} END {for (i in a) print z[i], a[i]}'

Or,

awk '{if (z[$1] == 0) a[$1]=$0; z[$1]+=1} END {for (i in a) print z[i], a[i]}'
Sign up to request clarification or add additional context in comments.

3 Comments

not his expected output, although this is exactly what i was thinking.
@ptierno Yes. Since the OP's awk statement produces a count of 3 for abc, I assumed that the 2 was a typo.
Ah yes I made a typo there. It should have been 3 abc.
0

If your input file contains billions of lines and you want to avoid sort, then you can just do:

awk '{a[$0]++} END{for(x in a) print a[x],x}' file.txt

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.