Remove duplicate based on condition awk/bash

Question

I would like to remove duplicates from a dataset which has 3 columns

A       0   3238
B       0   3367
C       0   3130
D       1   3130

I need to remove lines which contain duplicate values in the third column, but preferentially keeping those with the value '1' in the second column. I know how to remove duplicates using awk, but I can't work out how to add in the conditional statment.

Thanks

Kent · Accepted Answer · 2013-08-26 14:07:19Z

3

give this line a try:

awk '{if($3 in a)a[$3]=$2==1?$0:a[$3];else a[$3]=$0}END{for(i in a)print a[i]}' file

answered Aug 26, 2013 at 14:07

Kent

197k36 gold badges248 silver badges317 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Qben Over a year ago

+1 for neat way to solve it. I did not at first realize that $2==1?$0:a[$3] is evaluated before = wish was a bit confusing. I guess a[$3]=($2==1?$0:a[$3]) would work as well.

Kent Over a year ago

@Qben yes it does. and with brackets it would be easier to read.

Ed Morton Over a year ago

The syntax without brackets is non-portable, e.g. it would fail syntactically on MacOS awks (or so I hear...).

Ed Morton · Accepted Answer · 2013-08-26 15:27:14Z

3

$ sort -k2nr file | awk '!seen[$3]++'
D       1   3130
A       0   3238
B       0   3367

answered Aug 26, 2013 at 15:27

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

2 Comments

iamauser Over a year ago

Interesting bits of awk. Can you please explain the !seen[$3]++ part ?

Ed Morton Over a year ago

It's the common awk idiom to only output the first value in a series of potential duplicates. Every time a value is used as an index in the array the array's entry for that value is post-incremented, so the first time a value is seen it's array entry is zero so the ! operator makes the overall result true. After that first time though the array entry is non-zero so the ! makes the result false. It's like uniq but doesn't require the values to be sorted and let's you operate on fields rather than the whole input line/record.

Collectives™ on Stack Overflow

Remove duplicate based on condition awk/bash

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related