1

Input text file:

This is a simple test file.
#BEGIN
These lines should be extracted by our script.

Everything here will be copied.
#END
That should be all.
#BEGIN
Nothing from here.
#END

Desired output:

These lines should be extracted by our script.

Everything here will be copied.

My awk script is:

#!/usr/bin/awk -f
$1 ~ /#BEGIN/{a=1;next};a;$1 ~ /#END/ {exit}

and my current output is:

These lines should be extracted by our script.

Everything here will be copied.
#END

The only problem I'm having is that I'm still printing the "#END". I've been trying for a long time to somehow eliminate that. Not sure how to exactly do it.

3
  • Try this: $1 ~ /#BEGIN/{a=1;next}$1 ~ /#END/ {exit}a Commented Oct 30, 2016 at 17:05
  • @user000001 I think this worked. Can you please explain this line? I just wanna know how it works. Commented Oct 30, 2016 at 17:09
  • Sure, I'll add an answer Commented Oct 30, 2016 at 17:09

2 Answers 2

2

This becomes obvious IMO is we comment each command in the script. The script can be written like this:

#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
  next          # skip to the next line
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}

Note that I expanded a to the equivalent form a!=0{print $0}, to make the point clearer.

So the script starts printing each line when the flag is set, and when it reaches the END line, it has already printed the line before it exits. Since you don't want the END line to be printed, you should exit before you print the line. So the script should become:

#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
  next          # skip to the next line
}
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}

In this case, we exit before the line is printed. In a condensed form, it can be written as:

awk '$1~/#BEGIN/{a=1;next}$1~/#END/{exit}a' file

or a bit shorter

awk '$1~/#END/{exit}a;$1~/#BEGIN/{a=1}' file

Regarding the additional constraints raised in the comments, to avoid skipping any BEGIN blocks within the block that is to be printed, we should remove the next statement, and rearrange the lines like in the example right above. In an expanded form it would be like this:

#!/usr/bin/awk -f
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
}

To also avoid exiting if an END line is found before the block to be printed, we can check if the flag is set before exiting:

#!/usr/bin/awk -f
$1 ~ /#END/ && a != 0 {   # if we match the END line and the flag is set
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
}

or in a condensed form:

awk '$1~/#END/&&a{exit}a;$1~/#BEGIN/{a=1}' file
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks this helped me understand awk a lot better! I have some more constraints that I didn't mentioned in my question though. For any text file, I only want to extract whatever is in between the first #BEGIN and #END block. This code, if there are 2 begins and then an end, prints everything in between the first begin and end but does not print the second #BEGIN (I want the second begin to be printed bc it is in btwen the first begin and end block). Also, if the text file starts with an #END and then a #BEGIN AND #END block, it doesn't print anything. How to ignore the first #END?
@asddddddaaaad2: I added some more examples to handle these constraints.
how do I do the same exact thing in sed? So far I have: /#BEGIN/,/#END/!d *********** /#END/q ************ /#BEGIN/,/#END/{/#BEGIN/d;/#END/d;p;}
@asddddddaaaad2: I'm not familiar enough with sed to attempt to create a corresponding script. You could ask VIPIN KUMAR under his answer, since he used a sed for answering. Otherwise, you could ask a new question.
0

Try below sed command to get desired output -

vipin@kali:~$ sed  '/#BEGIN/,/#END/!d;/END/q' kk.txt|sed '1d;$d'
These lines should be extracted by our script.

Everything here will be copied.
vipin@kali:~$

Explanation -

use d to delete the content between two expression but !d will print them and then q for quit where command found END. 1d;$d to replace first and last line in our case #BEGIN and #END

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.