3

I need to remove the lines corresponding to first 7 occurrences of a string from a txt file in a pattern range (string1-string2).

Example of txt file content:

whatever
xpto string1 foo2
whatever1 
string2
xpto1 another_foo
xpto string2

string2 foo1

whatever
string2 another_xpto
string2 string2
foo xpto string2 whatever 
anything else foo string2
xpto
string2
foo whatever

I need a solution with sed ranges, something like that:

sed '/string1/,/string2/d' file.txt

The point is that I don't know how to extend /string2/ until the line corresponding to seventh match of string2. The desirable output should be:

whatever
anything else foo string2
xpto
string2
foo whatever
2
  • 1
    Read stackoverflow.com/questions/65621325/… to understand why this matters and then change the word "pattern" to either "string" or "regexp" everywhere it occurs in your question so we can best answer it. Note that /string/ makes no sense since / is the regexp delimiter, not the string delimiter " - you either meant /regexp/ or index($0,"string") (the latter being if you use awk since sed doesn't have any support for string matching). Commented Jan 17, 2021 at 23:53
  • If any of the answers you got helped you then see unix.stackexchange.com/help/someone-answers for what to do next. Commented Jan 20, 2021 at 4:20

4 Answers 4

5
sed -e:t -e'/string1/!b' -e'/\(.*string2\)\{7\}/d;N;bt'
1
  • 6
    welcome back mikeserv Commented Jan 17, 2021 at 3:05
4
awk '/string1/{c=7}; c<1; {c-=gsub(/string2/, "&")}' file

c is initially 0, and set to 7 if string1 is found. The line is printed whenever c<1.

The gsub function returns the number of times string2 appears on each line. The counter c is decremented by that value.

0
1

Here's one way to do what you want using literal strings:

$ cat tst.awk
BEGIN { lgth = length(end) }
index($0,beg) { inBlock = 1 }
inBlock {
    rec = $0
    while ( pos = index(rec,end) ) {
        if ( ++cnt >= min ) {
            inBlock = 0
        }
        rec = substr(rec,pos+lgth)
    }
    next
}
{ print }

$ awk -v beg='string1' -v end='string2' -v min=7 -f tst.awk file
whatever
anything else foo string2
xpto
string2
foo whatever

The above will interpret backslashes in the strings (e.g. \t would become a tab), if that's an issue let me know as it's an easy workaround, e.g. using ENVIRON[].

0

Perl

perl -ne '
  if (my $e = /string1/ ... s/string2/$&/g >= 7) {
      $_ .= $e =~ /E0/ ? next : <>, redo;
  }
  print;
' file

Posix sed:

sed -ne '
  /string1/!{p;d;}
  :loop
    n
    /string2/H
    g;s//&/7;t
  b loop
' file

Output:

whatever
anything else foo string2
xpto
string2
foo whatever

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.