How to extract string between two pattern using sed in Linux while second pattern repeating more lines in file

Question

E.g.

xyz
A1
B1
C1
D1
End
End
End
X1
X2
X3
Done

I want to extract all string between xyz to End pattern. So output should be

xyz
A1
B1
C1
D1
End
End
End

@RomanPerekhrest: Note that here, the last occurrence of the second pattern ends the block, not the first one. — choroba
– choroba, Commented Jul 10, 2017 at 13:00

score 1 · Accepted Answer · 2017-07-12 08:23:00Z

Method-a

perl -l -0777ne 'print /^(xyz.*?^End$(?:\nEnd$)*)/ms' yourfile

Working

Slurp the file so that it appears as a looooong string which can then be taken apart using the appropriate regex. The regex in this case is:
- look for xyz at the start of a line (not necessarily the file).
- look for the nearest End on a line by itself and then as many such consecutive lines.

Method-2

perl -lne '
   next unless /xyz/ ... eof;
   last if !/End/ and $flag;
   $flag ||= 1 if /End/;
   print;
' yourfile

Working

Here we operate Perl on a per-line basis and setup a small state machine.
- Reject any non-range portion of the file.
- Once we enter the right range, we print all lines till we hit the /End/ line. At that point we set the flag.
- Then we break out as soon as we see the first non /End/ line.

Method-3

sed -e '
   /xyz/!d
   :a
      $q;N
   /\nEnd$/!ba
   :b
      n
   /End/bb
   d
' yourfile

In this method we operate the first do-while loop (:a) which will accumulate lines starting from /xyz/ to /End/.

The second do-while loop (:b) will print lines till the next line happens to be /End/.

Method-4

sed -e '
   /xyz/,/End/!d
   H;/xyz/h;/End/!d
   :a
      $q;N
      /\(.*\)\n\1$/!{g;q;}
      s/.*\n//;H
   ba
' yourfile

With this method we are first selecting the right range then storing that range data in the hold space. The do-while loop (:a) is setup which incrementally appends to the hold space while the next line happens to be /End/.

Results

xyz
A1
B1
C1
D1
End
End
End

jimmij · Accepted Answer · 2017-07-10 13:12:40Z

0

This is a kind of job pcregrep is good at:

pcregrep -M 'xyz(.|\n)*End' file

Notice that it is very greedy and eats everything till the final End, including other Ends.

answered Jul 10, 2017 at 13:12

jimmij

48.7k20 gold badges136 silver badges141 bronze badges

Add a comment |

choroba · Accepted Answer · 2017-07-10 13:13:46Z

0

Perl to the rescue: Print all the lines between the first xyz and the last End:

 perl -ne '
     $inside = 1        if /^xyz$/;
     $seen_end = 1      if $inside && /^End$/;
     push @buff, $_     if $inside;
     print splice @buff if /^End$/ && @buff;
' input-file

From the first occurrence of xyz, we start pushing all lines into a buffer. Once End is encountered, we output and clear the buffer (see splice), but we continue to push lines into the buffer in case there was another End later.

edited Jul 10, 2017 at 13:13

answered Jul 10, 2017 at 13:08

choroba

49.7k7 gold badges92 silver badges119 bronze badges

Add a comment |

Philippos · Accepted Answer · 2017-07-10 13:37:42Z

0

As you are asking for an sed solution, I'd do it like this:

sed -e '/^xyz$/!d;:a' -e '$!{N;ba' -e '};s/\(.*\nEnd\).*/\1/'

So discard everything before the first pattern (/^xyz$/!d), then loop to collect all remaining lines in the pattern space (:a;$!{N;ba) and remove everything behind the last occurence of the second pattern (s/$.*\nEnd$.*/\1/).

Collecting in the pattern space is neccessary as addressing (/xyz/,/End/) is not greedy, but .* inside the pattern space is.

answered Jul 10, 2017 at 13:37

Philippos

13.8k2 gold badges42 silver badges82 bronze badges

Add a comment |

RomanPerekhrest · Accepted Answer · 2017-07-10 13:43:54Z

0

awk solution:

awk '/xyz/,/End/{ print $0; n=NR }($0=="End" && n && NR>n && NR-n++ == 1)' file

The output:

xyz
A1
B1
C1
D1
End
End
End

/xyz/,/End/ - record range, from xyz to End
n=NR - capturing record number (on range matching - will eventually contain the number of the last record of the range)

edited Jul 10, 2017 at 13:43

answered Jul 10, 2017 at 13:37

RomanPerekhrest

30.9k5 gold badges48 silver badges68 bronze badges

Add a comment |

Stack Exchange Network

How to extract string between two pattern using sed in Linux while second pattern repeating more lines in file

5 Answers 5

Method-a

Working

Method-2

Working

Method-3

Method-4

Results

You must log in to answer this question.

Hot Network Questions

How to extract string between two pattern using sed in Linux while second pattern repeating more lines in file

5 Answers 5

Method-a

Working

Method-2

Working

Method-3

Method-4

Results

You must log in to answer this question.

Related

Hot Network Questions