0

This maybe extension from the question: Incorporate variables into bash code line

I just realize in my text, lines actually come in variable format.

2   118610455   P2_PM_2_5034    T   <DUP:TANDEM>    40  .   END=118610566;SVLEN=110;SVTYPE=TDUP;CIPOS=-100,55;CIEND=-56,100;IMPRECISE;DBVARID=esv7540;VALIDATED;VALMETHOD=CGH;SVMETHOD=RP 

1   859214  P2_M_061510_1_73    C   <DEL>   .   .   CIEND=-130,50;CIPOS=-57,93;END=860180;IMPRECISE;SVLEN=-966;SVTYPE=DEL;VALIDATED;DBVARID=esv10036;VALMETHOD=CGH;SVMETHOD=RD,RP

What I need is

2 118610455 118610566
1 859214 860180

Just as shown in above, this "END=#" may come in different positions at 8th column. So basically I need to find "END=.." part from 8th column first, then grep the number. So this is actually about how to grep specific pattern from string ( in this case, the pattern is "END=")

But how can I do that? thx

3 Answers 3

1

Grep:

You can use the -o option of grep for your search:

Test:

[jaypal:~/Temp] grep -o "END=[0-9]\+;" file | tr -ds 'END=|;' ''
118610566
860180

But if you are looking for a complete solution then how about using awk (sorry I know this wasn't your requirement. But here are two solutions:

Awk:

If the first and second parameters you want do not vary in position, then we can split each values in specific fields and then loop over each of them. As soon as we reach a field that is END we print the $1 and $4 and then print the column next to END.

awk -v FS="[ ;=]" '{for(i=1;i<=NF;i++) if ($i=="END") print $1,$4,$(i+1)}' file

Test:

[jaypal:~/Temp] cat file
2   118610455   P2_PM_2_5034    T   <DUP:TANDEM>    40  .   END=118610566;SVLEN=110;SVTYPE=TDUP;CIPOS=-100,55;CIEND=-56,100;IMPRECISE;DBVARID=esv7540;VALIDATED;VALMETHOD=CGH;SVMETHOD=RP 
1   859214  P2_M_061510_1_73    C   <DEL>   .   .   CIEND=-130,50;CIPOS=-57,93;END=860180;IMPRECISE;SVLEN=-966;SVTYPE=DEL;VALIDATED;DBVARID=esv10036;VALMETHOD=CGH;SVMETHOD=RD,RP

[jaypal:~/Temp] awk -v FS="[ ;=]" '{for(i=1;i<=NF;i++) if ($i=="END") print $1,$4,$(i+1)}' file
2 118610455 118610566
1 859214 860180

GNU AWK:

If you have gawk then it has a built-in function called gensub. That supports back references. So you can also do the following -

gawk '{print $1,$2,gensub(/.*\<END\>=(.[^;]*);.*/,"\\1",$0)}' file

Test:

[jaypal:~/Temp] gawk '{print $1,$2,gensub(/.*\<END\>=(.[^;]*);.*/,"\\1",$0)}' file
2 118610455 118610566
1 859214 860180
Sign up to request clarification or add additional context in comments.

Comments

0

Use sed:

$ cat input | sed -e 's/^\([0-9]\+\) \+\([0-9]\+\) .*\<END=\([0-9]\+\).*/\1 \2 \3/'

Comments

0

You can use a perl script for this, something like:

pax> perl -ne '{
         @arr=split;
         if (@arr[7] =~ /^END=/) {
             @arr[7] =~ s/^END=//;
         } else {
             @arr[7] =~ s/^.*;END=//;
         }
         @arr[7] =~ s/;.*$//;
         printf "%s %s %s\n", @arr[0], @arr[1], @arr[7];
     }' <qq.in
2 118610455 118610566
1 859214 860180

I've formatted that script for readability but you can just as easily use the one-liner:

perl -ne '{@arr=split;if (@arr[7] =~ /^END=/) {@arr[7] =~ s/^END=//;} else {@arr[7] =~ s/^.*;END=//;} @arr[7] =~ s/;.*$//; printf "%s %s %s\n", @arr[0], @arr[1], @arr[7];}' <qq.in

The way it works is simple once you understand it. The split gives you an array of the elements on the line and you just have to modify number 7 slightly.

If it starts with END=, just get rid of that bit. Otherwise get rid if everything up to and including ;END=.

Then get rid of everything after the first ; (in the already-modified version which has the N bit of END=N at the start).

Then just print out the three desired values.


Having thought about it some more, it may be better off as something a little simpler, like:

pax> perl -ne '{
        ($a,$b,$x,$x,$x,$x,$x,$c,$x) = split;
        $c = ";$c";
        $c =~ s/^.*;END=//;
        $c =~ s/;.*$//;
        print "$a $b $c\n";
    }' <qq.in

or the equivalent one-liner:

perl -ne '{($a,$b,$x,$x,$x,$x,$x,$c,$x)=split;$c=";$c";$c=~s/^.*;END=//;$c=~s/;.*$//;print "$a $b $c\n";}' <qq.in

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.