Linux command or script counting duplicated lines in a text file? [duplicate]

Question

If I have a text file with the following conent

red apple
green apple
green apple
orange
orange
orange

Is there a Linux command or script that I can use to get the following result?

1 red apple
2 green apple
3 orange

borrible · Accepted Answer · 2011-06-22 22:55:23Z

264

Send it through sort (to put adjacent items together) then uniq -c to give counts, i.e.:

sort filename | uniq -c

and to get that list in sorted order (by frequency) you can

sort filename | uniq -c | sort -nr

answered Jun 22, 2011 at 22:55

borrible

17.5k8 gold badges57 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Antonio Medeiros Over a year ago

I used this to count installed RPM packages and licenses: $ rpm -qa --qf "%{license}\n" | sort | uniq -c | sort -nr > ~/license_counts. More info here. Thanks.

krampstudio · Accepted Answer · 2014-04-15 07:42:20Z

64

Almost the same as borribles' but if you add the d param to uniq it only shows duplicates.

sort filename | uniq -cd | sort -nr

edited Apr 15, 2014 at 7:42

krampstudio

3,6312 gold badges45 silver badges66 bronze badges

answered Apr 15, 2014 at 7:14

Jaberino

6305 silver badges4 bronze badges

2 Comments

sepehr Over a year ago

Thumbs up for the little -d note.

Quantum7 Over a year ago

Good tip, although OP specifically seems to want "1 red apple" lines in the output as well.

mhyfritz · Accepted Answer · 2011-06-22 22:53:26Z

8

uniq -c file

and in case the file is not sorted already:

sort file | uniq -c

answered Jun 22, 2011 at 22:53

mhyfritz

8,5722 gold badges31 silver badges30 bronze badges

Comments

pajton · Accepted Answer · 2011-06-22 22:54:42Z

6

cat <filename> | sort | uniq -c

answered Jun 22, 2011 at 22:54

pajton

16.4k8 gold badges60 silver badges65 bronze badges

1 Comment

tripleee Over a year ago

Except the cat is useless

user unknown · Accepted Answer · 2011-06-22 23:04:23Z

Can you live with an alphabetical, ordered list:

echo "red apple
> green apple
> green apple
> orange
> orange
> orange
> " | sort -u

?

green apple
orange
red apple

or

sort -u FILE

-u stands for unique, and uniqueness is only reached via sorting.

A solution which preserves the order:

echo "red apple
green apple
green apple
orange
orange
orange
" | { old=""; while read line ; do   if [[ $line != $old ]]; then  echo $line;   old=$line; fi ; done }
red apple
green apple
orange

and, with a file

cat file | { 
old=""
while read line
do
  if [[ $line != $old ]]
  then
    echo $line
    old=$line
  fi
done }

The last two only remove duplicates, which follow immediately - which fits to your example.

echo "red apple
green apple
lila banana
green apple
" ...

Will print two apples, split by a banana.

Rahul · Accepted Answer · 2011-06-22 22:55:04Z

0

Try this

cat myfile.txt| sort| uniq

answered Jun 22, 2011 at 22:55

Rahul

78.1k14 gold badges80 silver badges132 bronze badges

1 Comment

drevicko Over a year ago

without the -c or -d flags, uniq doesn't distinguish duplicate lines from non-duplicates, or am I missing something?

Chris Eberle · Accepted Answer · 2011-06-22 22:55:51Z

0

To just get a count:

$> egrep -o '\w+' fruits.txt | sort | uniq -c

      3 apple
      2 green
      1 oragen
      2 orange
      1 red

To get a sorted count:

$> egrep -o '\w+' fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red
      2 green
      2 orange
      3 apple

EDIT

Aha, this was NOT along word boundaries, my bad. Here's the command to use for full lines:

$> cat fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red apple
      2 green apple
      2 orange

answered Jun 22, 2011 at 22:55

Chris Eberle

48.9k12 gold badges85 silver badges123 bronze badges

Comments

orestisf · Accepted Answer · 2020-07-21 08:59:31Z

-1

Here is a simple python script using the Counter type. The benefit is that this does not require sorting the file, essentially using zero memory:

import collections
import fileinput
import json

print(json.dumps(collections.Counter(map(str.strip, fileinput.input())), indent=2))

Output:

$ cat filename | python3 script.py
{
  "red apple": 1,
  "green apple": 2,
  "orange": 3
}

or you can use a simple one-liner:

$ cat filename | python3 -c 'print(__import__("json").dumps(__import__("collections").Counter(map(str.strip, __import__("fileinput").input())), indent=2))'

answered Jul 21, 2020 at 8:59

orestisf

1,4721 gold badge17 silver badges30 bronze badges

1 Comment

tripleee Over a year ago

As an aside, you want to avoid the useless cats

Collectives™ on Stack Overflow

Linux command or script counting duplicated lines in a text file? [duplicate]

8 Answers 8

1 Comment

2 Comments

Comments

1 Comment

Comments

1 Comment

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

1 Comment

2 Comments

Comments

1 Comment

Comments

1 Comment

Comments

1 Comment

Linked

Related