pulling text between two patterns with awk script

Question

Input text file:

This is a simple test file.
#BEGIN
These lines should be extracted by our script.

Everything here will be copied.
#END
That should be all.
#BEGIN
Nothing from here.
#END

Desired output:

These lines should be extracted by our script.

Everything here will be copied.

My awk script is:

#!/usr/bin/awk -f
$1 ~ /#BEGIN/{a=1;next};a;$1 ~ /#END/ {exit}

and my current output is:

These lines should be extracted by our script.

Everything here will be copied.
#END

The only problem I'm having is that I'm still printing the "#END". I've been trying for a long time to somehow eliminate that. Not sure how to exactly do it.

@user000001 I think this worked. Can you please explain this line? I just wanna know how it works. — asddddddaaaad2
– asddddddaaaad2, Commented Oct 30, 2016 at 17:09

user000001 · Accepted Answer · 2016-10-30 18:21:49Z

2

This becomes obvious IMO is we comment each command in the script. The script can be written like this:

#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
  next          # skip to the next line
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}

Note that I expanded a to the equivalent form a!=0{print $0}, to make the point clearer.

So the script starts printing each line when the flag is set, and when it reaches the END line, it has already printed the line before it exits. Since you don't want the END line to be printed, you should exit before you print the line. So the script should become:

#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
  next          # skip to the next line
}
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}

In this case, we exit before the line is printed. In a condensed form, it can be written as:

awk '$1~/#BEGIN/{a=1;next}$1~/#END/{exit}a' file

or a bit shorter

awk '$1~/#END/{exit}a;$1~/#BEGIN/{a=1}' file

Regarding the additional constraints raised in the comments, to avoid skipping any BEGIN blocks within the block that is to be printed, we should remove the next statement, and rearrange the lines like in the example right above. In an expanded form it would be like this:

#!/usr/bin/awk -f
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
}

To also avoid exiting if an END line is found before the block to be printed, we can check if the flag is set before exiting:

#!/usr/bin/awk -f
$1 ~ /#END/ && a != 0 {   # if we match the END line and the flag is set
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
}

or in a condensed form:

awk '$1~/#END/&&a{exit}a;$1~/#BEGIN/{a=1}' file

edited Oct 30, 2016 at 18:21

answered Oct 30, 2016 at 17:18

user000001

33.8k14 gold badges86 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

asddddddaaaad2 Over a year ago

Thanks this helped me understand awk a lot better! I have some more constraints that I didn't mentioned in my question though. For any text file, I only want to extract whatever is in between the first #BEGIN and #END block. This code, if there are 2 begins and then an end, prints everything in between the first begin and end but does not print the second #BEGIN (I want the second begin to be printed bc it is in btwen the first begin and end block). Also, if the text file starts with an #END and then a #BEGIN AND #END block, it doesn't print anything. How to ignore the first #END?

user000001 Over a year ago

@asddddddaaaad2: I added some more examples to handle these constraints.

asddddddaaaad2 Over a year ago

how do I do the same exact thing in sed? So far I have: /#BEGIN/,/#END/!d *********** /#END/q ************ /#BEGIN/,/#END/{/#BEGIN/d;/#END/d;p;}

user000001 Over a year ago

@asddddddaaaad2: I'm not familiar enough with sed to attempt to create a corresponding script. You could ask VIPIN KUMAR under his answer, since he used a sed for answering. Otherwise, you could ask a new question.

VIPIN KUMAR · Accepted Answer · 2016-10-30 18:21:52Z

0

Try below sed command to get desired output -

vipin@kali:~$ sed  '/#BEGIN/,/#END/!d;/END/q' kk.txt|sed '1d;$d'
These lines should be extracted by our script.

Everything here will be copied.
vipin@kali:~$

Explanation -

use d to delete the content between two expression but !d will print them and then q for quit where command found END. 1d;$d to replace first and last line in our case #BEGIN and #END

answered Oct 30, 2016 at 18:21

VIPIN KUMAR

3,1572 gold badges25 silver badges37 bronze badges

Collectives™ on Stack Overflow

pulling text between two patterns with awk script

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related