Replace string in column by string in next column (one file)

Question

I would like to replace the "." that is in the middle of the column two, by the string in column 3.

Input file (tab-delimited):

0   AAAAAAAAGTTT.TATAGTAATATA   T   x   HPNK_05032012_new.fna
1   AAAAAAACGACG.ATTTTACAATAC   C   x   HPNK_05032012_new.fna
2   AAAAAAAGCAGG.CATTATCGCTGG   G   x   HPNK_05032012_new.fna
3   AAAAAAAGGAAC.GTGGAACGTTGG   A   x   HPNK_05032012_new.fna
5   AAAAAACACAAC.ATTGAGCAACTT   A   x   HPNK_05032012_new.fna
6   AAAAAACACCCA.CTGTGAAAGAAA   T   x   HPNK_05032012_new.fna
9   AAAAAACGCCAA.GTCAGCTACAAA   C   x   HPNK_05032012_new.fna

Desired output:

0   AAAAAAAAGTTTTTATAGTAATATA   T   x   HPNK_05032012_new.fna
1   AAAAAAACGACGCATTTTACAATAC   C   x   HPNK_05032012_new.fna
2   AAAAAAAGCAGGGCATTATCGCTGG   G   x   HPNK_05032012_new.fna
3   AAAAAAAGGAACAGTGGAACGTTGG   A   x   HPNK_05032012_new.fna
5   AAAAAACACAACAATTGAGCAACTT   A   x   HPNK_05032012_new.fna
6   AAAAAACACCCATCTGTGAAAGAAA   T   x   HPNK_05032012_new.fna
9   AAAAAACGCCAACGTCAGCTACAAA   C   x   HPNK_05032012_new.fna

Hi mpapec, your one-liner is just deleting ".", 0 AAAAAAAAGTTTTATAGTAATATA T x HPNK_05032012_new.fna — biotech
– biotech, Commented Feb 18, 2014 at 11:21
@mpapec: in the desired output he REPLACES . with the content of the next column — Olivier Dulac
– Olivier Dulac, Commented Feb 18, 2014 at 13:04

fedorqui · Accepted Answer · 2014-02-18 10:19:30Z

3

Use:

$ awk '{sub("\.", $3, $2)}1' file
0 AAAAAAAAGTTTTTATAGTAATATA T x HPNK_05032012_new.fna
1 AAAAAAACGACGCATTTTACAATAC C x HPNK_05032012_new.fna
2 AAAAAAAGCAGGGCATTATCGCTGG G x HPNK_05032012_new.fna
3 AAAAAAAGGAACAGTGGAACGTTGG A x HPNK_05032012_new.fna
5 AAAAAACACAACAATTGAGCAACTT A x HPNK_05032012_new.fna
6 AAAAAACACCCATCTGTGAAAGAAA T x HPNK_05032012_new.fna
9 AAAAAACGCCAACGTCAGCTACAAA C x HPNK_05032012_new.fna

It is basically replacing the . with the 3rd field by using the sub() function. Then 1 performs the awk's default behaviour: {print $0}.

Since your question shows spaces in between columns, my output is just showing one space. In case your input uses tabs, add tab as field separator:

awk 'BEGIN{FS=OFS="\t"} {sub("\.", $3, $2)}1' file

answered Feb 18, 2014 at 10:19

fedorqui

293k113 gold badges592 silver badges640 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

MLSC Over a year ago

at the second column ,. are not removed for me

fedorqui Over a year ago

@MortezaLSC but does your sample code contain tabs in between columns? Note that if not, the second field won't be the second column.

MLSC Over a year ago

Hmmm...I copied what popnard wrote...Yes..worked great...thank you

biotech Over a year ago

My input file is tab-separated and is not working the second fedorqui's command, why?

fedorqui Over a year ago

To make sure it is tab-separated, @popnard , try doing awk 'BEGIN{FS=OFS="\t"} {print $2} file to make sure awk is taking the second column as the second field.

|

mpapec · Accepted Answer · 2014-02-18 15:04:38Z

2

perl -lane '$F[1] =~ s/[.]/$F[2]/; print "@F"' file

or shorter,

perl -ape 's/[.]/$F[2]/' file

edited Feb 18, 2014 at 15:04

answered Feb 18, 2014 at 13:25

mpapec

50.7k8 gold badges71 silver badges133 bronze badges

Comments

BMW · Accepted Answer · 2014-02-19 10:58:26Z

1

Using awk, which will keep the original format

awk '$19=$33' FS="" OFS="" file

answered Feb 19, 2014 at 10:58

BMW

45.6k13 gold badges105 silver badges123 bronze badges

Collectives™ on Stack Overflow

Replace string in column by string in next column (one file)

3 Answers 3

12 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

12 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related