143

If I have a text file with the following conent

red apple
green apple
green apple
orange
orange
orange

Is there a Linux command or script that I can use to get the following result?

1 red apple
2 green apple
3 orange
0

8 Answers 8

264

Send it through sort (to put adjacent items together) then uniq -c to give counts, i.e.:

sort filename | uniq -c

and to get that list in sorted order (by frequency) you can

sort filename | uniq -c | sort -nr
Sign up to request clarification or add additional context in comments.

1 Comment

I used this to count installed RPM packages and licenses: $ rpm -qa --qf "%{license}\n" | sort | uniq -c | sort -nr > ~/license_counts. More info here. Thanks.
64

Almost the same as borribles' but if you add the d param to uniq it only shows duplicates.

sort filename | uniq -cd | sort -nr

2 Comments

Thumbs up for the little -d note.
Good tip, although OP specifically seems to want "1 red apple" lines in the output as well.
8

uniq -c file

and in case the file is not sorted already:

sort file | uniq -c

Comments

6
cat <filename> | sort | uniq -c

1 Comment

2

Can you live with an alphabetical, ordered list:

echo "red apple
> green apple
> green apple
> orange
> orange
> orange
> " | sort -u 

?

green apple
orange
red apple

or

sort -u FILE

-u stands for unique, and uniqueness is only reached via sorting.

A solution which preserves the order:

echo "red apple
green apple
green apple
orange
orange
orange
" | { old=""; while read line ; do   if [[ $line != $old ]]; then  echo $line;   old=$line; fi ; done }
red apple
green apple
orange

and, with a file

cat file | { 
old=""
while read line
do
  if [[ $line != $old ]]
  then
    echo $line
    old=$line
  fi
done }

The last two only remove duplicates, which follow immediately - which fits to your example.

echo "red apple
green apple
lila banana
green apple
" ...

Will print two apples, split by a banana.

Comments

0

Try this

cat myfile.txt| sort| uniq

1 Comment

without the -c or -d flags, uniq doesn't distinguish duplicate lines from non-duplicates, or am I missing something?
0

To just get a count:

$> egrep -o '\w+' fruits.txt | sort | uniq -c

      3 apple
      2 green
      1 oragen
      2 orange
      1 red

To get a sorted count:

$> egrep -o '\w+' fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red
      2 green
      2 orange
      3 apple

EDIT

Aha, this was NOT along word boundaries, my bad. Here's the command to use for full lines:

$> cat fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red apple
      2 green apple
      2 orange

Comments

-1

Here is a simple python script using the Counter type. The benefit is that this does not require sorting the file, essentially using zero memory:

import collections
import fileinput
import json

print(json.dumps(collections.Counter(map(str.strip, fileinput.input())), indent=2))

Output:

$ cat filename | python3 script.py
{
  "red apple": 1,
  "green apple": 2,
  "orange": 3
}

or you can use a simple one-liner:

$ cat filename | python3 -c 'print(__import__("json").dumps(__import__("collections").Counter(map(str.strip, __import__("fileinput").input())), indent=2))'

1 Comment

As an aside, you want to avoid the useless cats

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.