Remove lines corresponding to first 7 matches of a string (in a pattern range)

Question

I need to remove the lines corresponding to first 7 occurrences of a string from a txt file in a pattern range (string1-string2).

Example of txt file content:

whatever
xpto string1 foo2
whatever1 
string2
xpto1 another_foo
xpto string2

string2 foo1

whatever
string2 another_xpto
string2 string2
foo xpto string2 whatever 
anything else foo string2
xpto
string2
foo whatever

I need a solution with sed ranges, something like that:

sed '/string1/,/string2/d' file.txt

The point is that I don't know how to extend /string2/ until the line corresponding to seventh match of string2. The desirable output should be:

whatever
anything else foo string2
xpto
string2
foo whatever

Read stackoverflow.com/questions/65621325/… to understand why this matters and then change the word "pattern" to either "string" or "regexp" everywhere it occurs in your question so we can best answer it. Note that /string/ makes no sense since / is the regexp delimiter, not the string delimiter " - you either meant /regexp/ or index($0,"string") (the latter being if you use awk since sed doesn't have any support for string matching). — Ed Morton
– Ed Morton, Commented Jan 17, 2021 at 23:53
If any of the answers you got helped you then see unix.stackexchange.com/help/someone-answers for what to do next. — Ed Morton
– Ed Morton, Commented Jan 20, 2021 at 4:20

mikeserv · Accepted Answer · 2021-01-17 01:40:40Z

5

sed -e:t -e'/string1/!b' -e'/\(.*string2\)\{7\}/d;N;bt'

answered Jan 17, 2021 at 1:40

mikeserv

59.4k10 gold badges123 silver badges244 bronze badges

6

welcome back mikeserv

iruvar
– iruvar

2021-01-17 03:05:24 +00:00
Commented Jan 17, 2021 at 3:05

Add a comment |

Quasímodo · Accepted Answer · 2021-01-17 12:17:03Z

4

awk '/string1/{c=7}; c<1; {c-=gsub(/string2/, "&")}' file

c is initially 0, and set to 7 if string1 is found. The line is printed whenever c<1.

The gsub function returns the number of times string2 appears on each line. The counter c is decremented by that value.

edited Jan 17, 2021 at 12:17

answered Jan 16, 2021 at 19:09

Quasímodo

19.4k4 gold badges41 silver badges78 bronze badges

Add a comment |

Ed Morton · Accepted Answer · 2021-01-18 01:01:30Z

Here's one way to do what you want using literal strings:

$ cat tst.awk
BEGIN { lgth = length(end) }
index($0,beg) { inBlock = 1 }
inBlock {
    rec = $0
    while ( pos = index(rec,end) ) {
        if ( ++cnt >= min ) {
            inBlock = 0
        }
        rec = substr(rec,pos+lgth)
    }
    next
}
{ print }

$ awk -v beg='string1' -v end='string2' -v min=7 -f tst.awk file
whatever
anything else foo string2
xpto
string2
foo whatever

The above will interpret backslashes in the strings (e.g. \t would become a tab), if that's an issue let me know as it's an easy workaround, e.g. using ENVIRON[].

guest_7 · Accepted Answer · 2021-01-18 18:11:53Z

0

Perl

perl -ne '
  if (my $e = /string1/ ... s/string2/$&/g >= 7) {
      $_ .= $e =~ /E0/ ? next : <>, redo;
  }
  print;
' file

Posix sed:

sed -ne '
  /string1/!{p;d;}
  :loop
    n
    /string2/H
    g;s//&/7;t
  b loop
' file

Output:

whatever
anything else foo string2
xpto
string2
foo whatever

edited Jan 18, 2021 at 18:11

answered Jan 18, 2021 at 15:32

guest_7

5,7881 gold badge9 silver badges13 bronze badges

Add a comment |

Stack Exchange Network

Remove lines corresponding to first 7 matches of a string (in a pattern range)

4 Answers 4

You must log in to answer this question.

Hot Network Questions

Remove lines corresponding to first 7 matches of a string (in a pattern range)

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions