awk script: removing line previous to pattern match and after, until a blank line

Question

I began learning awk yesterday in attempt to solve this problem (and learn a useful new language). At first I tried using sed, but soon realized it was not the correct tool to access/manipulate lines previous to a pattern match.

I need to:

Remove all lines containing "foo" (trivial on it's own, but not whilst keeping track of previous lines)
Find lines containing "bar"
Remove the line previous to the one containing "bar"
Remove all lines after and including the line containing "bar" until we reach a blank line

Example input:

This is foo stuff
I like food!
It is tasty!

stuff
something
stuff
stuff
This is bar
Hello everybody
I'm Dr. Nick

things
things
things

Desired output:

It is tasty!

stuff
something
stuff

things
things
things

My attempt:

{
    valid=1;             #boolean variable to keep track if x is valid and should be printed
    if ($x ~ /foo/){     #x is valid unless it contains foo 
        valid=0;         #invalidate x so that is doesn't get printed at the end
        next;
    }
    if ($0 ~ /bar/){     #if the current line contains bar
        valid = 0;       #x is invalid (don't print the previous line)
        while (NF == 0){ #don't print until we reach an empty line
            next;
        }
    }
    if (valid == 1){     #x was a valid line
        print x;                        
    }
    x=$0;                #x is a reference to the previous line
}

Super bonus points (not needed to solve my problem but I'm interesting in learning how this would be done):

Ability to remove n lines before pattern match
Option to include/disclude the blank line in output

aks · Accepted Answer · 2014-08-13 23:10:31Z

Below is an alternative awk script using patterns & functions to trigger state changes and manage output, which produces the same result.

function show_last() {
  if (!skip && !empty) {
    print last
  }
  last = $0
  empty = 0
}
function set_skip_empty(n) {
  skip = n
  last = $0
  empty = NR <= 0
}
BEGIN  { set_skip_empty(0)        }
END    { show_last() ;            }
/foo/  { next;                    }
/bar/  { set_skip_empty(1) ; next }
/^ *$/ { if (skip > 0) { set_skip_empty(0); next } else show_last() }
!/^ *$/{ if (skip > 0) { next }                    else show_last() }

This works by retaining the "current" line in a variable last, which is either ignored or output, depending on other events, such as the occurrence of foo and bar.

The empty variable keeps track of whether or not the last variable is really a blank line, or simple empty from inception (e.g., BEGIN).

To accomplish the "bonus points", replace last with an array of lines which could then accumulate N number of lines as desired.

To exclude blank lines (such as the one that terminates the bar filter), replace the empty test with a test on the length of the last variable. In awk, empty lines have no length (but, lines with blanks or tabs *do* have a length).

function show_last() {
  if (!skip && length(last) > 0) {
    print last
  }
  last = $0
}

will result in no blank lines of output.

Ed Morton · Accepted Answer · 2014-08-14 01:24:11Z

2

Read each blank-lines-separated paragraph in as a string, then do a gsub() removing the strings that match the RE for the pattern(s) you care about:

$ awk -v RS= -v ORS="\n\n" '{ gsub(/[^\n]*foo[^\n]*\n|\n[^\n]*\n[^\n]*bar.*/,"") }1' file
It is tasty!

stuff
something
stuff

things
things
things

To remove N lines, change [^\n]*\n to ([^\n]*\n){N}.

To not remove part of the RE use GNU awk and use gensub() instead of gsub().

To remove the blank lines, change the value of ORS.

Play with it...

edited Aug 14, 2014 at 1:24

answered Aug 14, 2014 at 0:29

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

4 Comments

John B Over a year ago

The variations in my test results are also strange (tested against @anubhava's solution). It's interesting that BSD awk is even faster than mawk using @EdMorton's approach.

Ed Morton Over a year ago

It just looks to me like you're both seeing the result of caching making your second runs faster. You need to run a script at least 2 or 3 times before THEN taking the time info for it to compare apples to apples.

John B Over a year ago

Good point, I just ran each several times though and got very similar results.

Ed Morton Over a year ago

Beats me then. Sounds like an excellent investigation opportunity for the OP :-).

anubhava · Accepted Answer · 2014-08-13 22:19:59Z

1

This awk should work without storing full file in memory:

awk '/bar/{skip=1;next} skip && p~/^$/ {skip=0} NR>1 && !skip && !(p~/foo/){print p} {p=$0} 
    END{if (!skip && !(p~/foo/)) print p}' file

It is tasty!

stuff
something
stuff

things
things
things

answered Aug 13, 2014 at 22:19

anubhava

790k67 gold badges603 silver badges671 bronze badges

Comments

jaypal singh · Accepted Answer · 2014-08-13 22:50:30Z

One way:

awk '
      /foo/ { next }     
 flag && NF { next }     
flag && !NF { flag = 0 }      
      /bar/ { delete line[NR-1]; idx-=1; flag = 1; next } 
            { line[++idx] = $0 }
END {
    for (x=1; x<=idx; x++) print line[x]
}' file
It is tasty!

stuff
something
stuff

things
things
things

If line contains foo skip it.
If flag is enabled and line is not blank skip it.
If flag is enabled and line is blank disable the flag.
If line contains bar delete the previous line, reset the counter, enable the flag and skip it
Store all lines that manages through in array indexed at incrementing number
In the END block print the lines.

Side Notes:

To remove n number of lines before a pattern match, you can create a loop. Start with current line number and using a reverse for loop you can remove lines from your temporary cache (array). You can then subtract n from your self defined counter variable.
To include or exclude blank lines you can use the NF variable. For a typical line, NF variable is set to number of fields based on your field separator. For blank lines this variable is 0. For example, if you modify the line above END block to NF { line[++idx] = $0 } in the answer above you will see we have bypassed all blank lines from output.

Collectives™ on Stack Overflow

awk script: removing line previous to pattern match and after, until a blank line

4 Answers 4

Comments

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related