My file is in the format
>id1
sequence1
>id2
sequence2
>id1
sequence3
the output i want is:
>id1
sequence1
>id2
sequence2
i.e. I need to remove sequences and id both in pairs if id is repeat.
I tried the following code, but it doesnt work.
awk '{
if(NR%2 == 1)
{
fastaheader = $0; x[fasta_header] = x[fasta_header] + 1;
}
else
{
seq = $0; {if(x[fasta_header] <= 1) {print fasta_header;print seq;}}
}
}' filename.txt
id1withsequence1Aandid1withsequence1B, and you only want thesequence1Aentry to be shown. Or is it the combination ofid1plus the sequence data that must be duplicated in its entirety (so you'd want bothid1withsequence1Aandid1withsequence1Bto appear in the output)? Your question says "Remove ID and sequence if the ID is repeated"; your comments say "Remove ID and sequence if the combination of ID and sequence are repeated".