1

I began learning awk yesterday in attempt to solve this problem (and learn a useful new language). At first I tried using sed, but soon realized it was not the correct tool to access/manipulate lines previous to a pattern match.

I need to:

  1. Remove all lines containing "foo" (trivial on it's own, but not whilst keeping track of previous lines)
  2. Find lines containing "bar"
  3. Remove the line previous to the one containing "bar"
  4. Remove all lines after and including the line containing "bar" until we reach a blank line

Example input:

This is foo stuff
I like food!
It is tasty!

stuff
something
stuff
stuff
This is bar
Hello everybody
I'm Dr. Nick

things
things
things

Desired output:

It is tasty!

stuff
something
stuff

things
things
things

My attempt:

{
    valid=1;             #boolean variable to keep track if x is valid and should be printed
    if ($x ~ /foo/){     #x is valid unless it contains foo 
        valid=0;         #invalidate x so that is doesn't get printed at the end
        next;
    }
    if ($0 ~ /bar/){     #if the current line contains bar
        valid = 0;       #x is invalid (don't print the previous line)
        while (NF == 0){ #don't print until we reach an empty line
            next;
        }
    }
    if (valid == 1){     #x was a valid line
        print x;                        
    }
    x=$0;                #x is a reference to the previous line
}

Super bonus points (not needed to solve my problem but I'm interesting in learning how this would be done):

  1. Ability to remove n lines before pattern match
  2. Option to include/disclude the blank line in output

4 Answers 4

2

Below is an alternative awk script using patterns & functions to trigger state changes and manage output, which produces the same result.

function show_last() {
  if (!skip && !empty) {
    print last
  }
  last = $0
  empty = 0
}
function set_skip_empty(n) {
  skip = n
  last = $0
  empty = NR <= 0
}
BEGIN  { set_skip_empty(0)        }
END    { show_last() ;            }
/foo/  { next;                    }
/bar/  { set_skip_empty(1) ; next }
/^ *$/ { if (skip > 0) { set_skip_empty(0); next } else show_last() }
!/^ *$/{ if (skip > 0) { next }                    else show_last() }

This works by retaining the "current" line in a variable last, which is either ignored or output, depending on other events, such as the occurrence of foo and bar.

The empty variable keeps track of whether or not the last variable is really a blank line, or simple empty from inception (e.g., BEGIN).

To accomplish the "bonus points", replace last with an array of lines which could then accumulate N number of lines as desired.

To exclude blank lines (such as the one that terminates the bar filter), replace the empty test with a test on the length of the last variable. In awk, empty lines have no length (but, lines with blanks or tabs *do* have a length).

function show_last() {
  if (!skip && length(last) > 0) {
    print last
  }
  last = $0
}

will result in no blank lines of output.

Sign up to request clarification or add additional context in comments.

Comments

2

Read each blank-lines-separated paragraph in as a string, then do a gsub() removing the strings that match the RE for the pattern(s) you care about:

$ awk -v RS= -v ORS="\n\n" '{ gsub(/[^\n]*foo[^\n]*\n|\n[^\n]*\n[^\n]*bar.*/,"") }1' file
It is tasty!

stuff
something
stuff

things
things
things

To remove N lines, change [^\n]*\n to ([^\n]*\n){N}.

To not remove part of the RE use GNU awk and use gensub() instead of gsub().

To remove the blank lines, change the value of ORS.

Play with it...

4 Comments

The variations in my test results are also strange (tested against @anubhava's solution). It's interesting that BSD awk is even faster than mawk using @EdMorton's approach.
It just looks to me like you're both seeing the result of caching making your second runs faster. You need to run a script at least 2 or 3 times before THEN taking the time info for it to compare apples to apples.
Good point, I just ran each several times though and got very similar results.
Beats me then. Sounds like an excellent investigation opportunity for the OP :-).
1

This awk should work without storing full file in memory:

awk '/bar/{skip=1;next} skip && p~/^$/ {skip=0} NR>1 && !skip && !(p~/foo/){print p} {p=$0} 
    END{if (!skip && !(p~/foo/)) print p}' file

It is tasty!

stuff
something
stuff

things
things
things

Comments

1

One way:

awk '
      /foo/ { next }     
 flag && NF { next }     
flag && !NF { flag = 0 }      
      /bar/ { delete line[NR-1]; idx-=1; flag = 1; next } 
            { line[++idx] = $0 }
END {
    for (x=1; x<=idx; x++) print line[x]
}' file
It is tasty!

stuff
something
stuff

things
things
things
  • If line contains foo skip it.
  • If flag is enabled and line is not blank skip it.
  • If flag is enabled and line is blank disable the flag.
  • If line contains bar delete the previous line, reset the counter, enable the flag and skip it
  • Store all lines that manages through in array indexed at incrementing number
  • In the END block print the lines.

Side Notes:

  • To remove n number of lines before a pattern match, you can create a loop. Start with current line number and using a reverse for loop you can remove lines from your temporary cache (array). You can then subtract n from your self defined counter variable.

  • To include or exclude blank lines you can use the NF variable. For a typical line, NF variable is set to number of fields based on your field separator. For blank lines this variable is 0. For example, if you modify the line above END block to NF { line[++idx] = $0 } in the answer above you will see we have bypassed all blank lines from output.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.