Delete repetitions of same values in different columns in bash script linux

Question

I have a file like that:

A2M TIAM1

AARSD1 NLRP12

ABCA12 ABCA1

ABCA12 NR1H2

ABCA1 ABCA12

ABCA13 APOA2

ABCA13 CLK1

NLRP12 AARSD1

ABCA13 HAGH

ABCC10 ATP2B2

I want to get rid of the repeated values col2 col1. For example:

ABCA1 ABCA12

...and:

NLRP12 AARSD1

...in this case.

What is the best way to it in a Bash script?

you want 1. if one row is col1 col2 one row is col2 col1 delete duplicated 2. if the row is col1 col2, both the col1 and col2 have appeared in the other rows(but may not be the same row), then the delete the row.; which do you want? — ZNZNZ
– ZNZNZ, Commented Mar 29, 2018 at 3:08

nbari · Accepted Answer · 2018-03-29 10:45:10Z

1

This is using awk:

awk '!seen[$1]++ && !seen[$2]++' your-file

This will print only unique values found in col1 and in col2, based on your input this will be the output:

A2M TIAM1
AARSD1 NLRP12
ABCA12 ABCA1
ABCA13 APOA2
ABCC10 ATP2B2

To group by distinct pairs give a try to this:

awk '!seen[$1 $2]++ && !seen[$2 $1]++' your-file

This will be the output:

A2M TIAM1
AARSD1 NLRP12
ABCA12 ABCA1
ABCA12 NR1H2
ABCA13 APOA2
ABCA13 CLK1
ABCA13 HAGH
ABCC10 ATP2B2

edited Mar 29, 2018 at 10:45

answered Mar 28, 2018 at 22:40

nbari

27.2k13 gold badges87 silver badges149 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jaume Sastre Tomàs Over a year ago

But this is not I want. This output misses for example ABCA13 CLK1 that does not have the "repeated value" CLK1 ABCA13

Jaume Sastre Tomàs Over a year ago

I need to output the pairs that do not have the reverse apparition in the list. And for the ones that have the reverse apparition, so only the first entry. For example for pair AARSD1 NLRP12, I only want this entry because there is the pair NR1H2 AARSD1 in the list. If the pair do not have the reverse, I want as well. For example ABCA13 CLK1 it has to be as well.

nbari Over a year ago

@JaumeSastreTomàs I updated the answer hope it works.

Collectives™ on Stack Overflow

Delete repetitions of same values in different columns in bash script linux

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related