5

I have a file the contains

apple
apple
banana
orange
apple
orange

I want a script that finds the duplicates apple and orange and tells the user that the following : apple and orange are repeated. I tried

nawk '!x[$1]++' FS="," filename

to find repeated item so how can i print them out in unix bash ?

3 Answers 3

11

In order to print the duplicate lines, you can say:

$ sort filename | uniq -d
apple
orange

If you want to print the count as well, supply the -c option to uniq:

$ sort filename | uniq -dc
      3 apple
      2 orange
Sign up to request clarification or add additional context in comments.

1 Comment

Note that according to uniq, word "ウェイター" is duplicate of "ウエイター" (ェ=エ)
4

+1 for devnul's answer. However, if the file contains spaces instead of newlines as delimiter. then the following would work.

tr [:blank:] "\n" < filename | sort | uniq -d

Comments

1

Update:

The question has been changed significantly. Formerly, when answering this, the input file should look like:

apple apple banana orange apple orange
banana orange apple
...

However, the solution will work anyway, but might be a little bit too complicated for this special use case.


The following awk script will do the job:

awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file

Output:

apple 3
orange 2

It is more understandable in a form like this:

#!/usr/bin/awk

{
  i=1;
  # iterate through every field
  while(i <= NF) {
    a[$(i++)]++; # count occurrences of every field
  }
}

# after all input lines have been read ...
END {
  for(i in a) {
    # ... print those fields which occurred more than 1 time
    if(a[i] > 1) {
      print i,a[i];
    }
  }
}

Then make the file executable and execute it passing the input file name to it:

chmod +x script.awk
./script.awk your.file  

4 Comments

+1. On attempting to format the question, it became evident that the input file had items placed on different lines. I agree that it was hard to guess that.
@devnull :) I guessed something like this.. however, now we have two solutions for two slightly different use cases. as a result, this is not so bad.....
what if there are 2 fields ? and how does it know if which file it should search ?
@user2613272 Scroll to the above code field to the right.. You'll need to give the file name as an argument.. It should work with two fields.. doesn't it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.