Modifying text column wise with sed/awk

Question

I have an input data with three columns (tab separated) like this:

  a  mrna_185598_SGL 463
  b  mrna_9210_DLT   463
  c  mrna_9210_IND   463
  d  mrna_9210_INS   463
  e  mrna_9210_SGL   463

How can I use sed/awk to modify it into four columns data that looks like this:

a  mrna_185598 SGL   463
b  mrna_9210   DLT   463
c  mrna_9210   IND   463
d  mrna_9210   INS   463
e  mrna_9210   SGL   463

In principle I want to split the original "mrna" string into 2 parts.

ghostdog74 · Accepted Answer · 2010-01-28 04:03:42Z

2

something like this

awk 'BEGIN{FS=OFS="\t"}{split($2,a,"_"); $2=a[1]"_"a[2]"\t"a[3] }1'  file

output

# ./shell.sh
a       mrna_185598     SGL     463
b       mrna_9210       DLT     463
c       mrna_9210       IND     463
d       mrna_9210       INS     463
e       mrna_9210       SGL     463

use nawk on Solaris

and if you have bash

while IFS=$'\t' read -r a b c
do
    front=${b%_*}
    back=${b##*_}
    printf "$a\t$front\t$back\t$c\n"
done <"file"

edited Jan 28, 2010 at 4:03

answered Jan 28, 2010 at 3:38

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ignacio Vazquez-Abrams · Accepted Answer · 2010-01-28 03:37:21Z

2

gawk:

{
  print $1 "\t" gensub(/_/, "\t", 2, $2) "\t" $3
}

answered Jan 28, 2010 at 3:37

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Comments

sgmitchell · Accepted Answer · 2010-01-28 03:49:48Z

1

you dont need to use sed. instead use tr

cat *FILENAME* | tr '_[:upper:]{3}\t' '\t[:lower:]{3}\t' >> *FILEOUT*

cat FILENAME will print out the files witch will then be piped ('|') to tr (translate). tr will replace anything that has an underscore followed by 3 uppercase characters and then a tab with a tab instead of the underscore. Then it will append it to FILEOUT.

answered Jan 28, 2010 at 3:49

sgmitchell

614 bronze badges

1 Comment

ghostdog74 Over a year ago

useless use of cat. pass the file to tr instead. --. tr 'blah 'blah' < file >> fileout. and did you test your command properly?

Damodharan R · Accepted Answer · 2010-01-28 03:50:35Z

1

$ cat test.txt
  a  mrna_185598_SGL 463
  b  mrna_9210_DLT   463
  c  mrna_9210_IND   463
  d  mrna_9210_INS   463
  e  mrna_9210_SGL   463

$ cat test.txt | sed -E 's/(\S+)_(\S+)\s+(\S+)$/\1\t\2\t\3/'
  a  mrna_185598    SGL 463
  b  mrna_9210  DLT 463
  c  mrna_9210  IND 463
  d  mrna_9210  INS 463
  e  mrna_9210  SGL 463

answered Jan 28, 2010 at 3:50

Damodharan R

1,5077 silver badges10 bronze badges

1 Comment

ghostdog74 Over a year ago

useless use of cat. pass the file name to sed instead. -- sed 'options' filename

Kyle Butt · Accepted Answer · 2010-01-28 03:50:59Z

1

Provided they don't look too much different from what you've posted:

sed -E 's/mrna_([0-9]+)_/mrna_\1\t/'

edited Jan 28, 2010 at 3:50

answered Jan 28, 2010 at 3:40

Kyle Butt

9,8703 gold badges24 silver badges16 bronze badges

Comments

Claes Wikner · Accepted Answer · 2019-07-02 00:28:06Z

1

gawk '{$1=$1; $0=gensub(/_/,"\t",2);print}' file

a mrna_185598   SGL 463
b mrna_9210 DLT 463
c mrna_9210 IND 463
d mrna_9210 INS 463
e mrna_9210 SGL 463

answered Jul 2, 2019 at 0:28

Claes Wikner

1,5271 gold badge9 silver badges8 bronze badges

Comments

potong · Accepted Answer · 2019-07-01 23:20:03Z

0

This might work for you (GNU sed):

sed 's/_/\t/2' file

Replace the second occurrence of a _ by a tab.

answered Jul 1, 2019 at 23:20

potong

59.3k6 gold badges55 silver badges92 bronze badges

Collectives™ on Stack Overflow

Modifying text column wise with sed/awk

7 Answers 7

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related