0

I have a file like that:

A2M TIAM1

AARSD1 NLRP12

ABCA12 ABCA1

ABCA12 NR1H2

ABCA1 ABCA12

ABCA13 APOA2

ABCA13 CLK1

NLRP12 AARSD1

ABCA13 HAGH

ABCC10 ATP2B2

I want to get rid of the repeated values col2 col1. For example:

ABCA1 ABCA12

...and:

NLRP12 AARSD1

...in this case.

What is the best way to it in a Bash script?

1
  • you want 1. if one row is col1 col2 one row is col2 col1 delete duplicated 2. if the row is col1 col2, both the col1 and col2 have appeared in the other rows(but may not be the same row), then the delete the row.; which do you want? Commented Mar 29, 2018 at 3:08

1 Answer 1

1

This is using awk:

awk '!seen[$1]++ && !seen[$2]++' your-file

This will print only unique values found in col1 and in col2, based on your input this will be the output:

A2M TIAM1
AARSD1 NLRP12
ABCA12 ABCA1
ABCA13 APOA2
ABCC10 ATP2B2

To group by distinct pairs give a try to this:

awk '!seen[$1 $2]++ && !seen[$2 $1]++' your-file

This will be the output:

A2M TIAM1
AARSD1 NLRP12
ABCA12 ABCA1
ABCA12 NR1H2
ABCA13 APOA2
ABCA13 CLK1
ABCA13 HAGH
ABCC10 ATP2B2
Sign up to request clarification or add additional context in comments.

3 Comments

But this is not I want. This output misses for example ABCA13 CLK1 that does not have the "repeated value" CLK1 ABCA13
I need to output the pairs that do not have the reverse apparition in the list. And for the ones that have the reverse apparition, so only the first entry. For example for pair AARSD1 NLRP12, I only want this entry because there is the pair NR1H2 AARSD1 in the list. If the pair do not have the reverse, I want as well. For example ABCA13 CLK1 it has to be as well.
@JaumeSastreTomàs I updated the answer hope it works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.