Replace string between two strings unless it contains a substring

Question

I have a multiline string with three of the following lines of the following form:

Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3

I wish to replace all texts between Text1 and Text3 with Text4, unless the intermediate text contains the character !. Thus, the desired output is:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3

Let c be the multiline string above. I believe re.sub is the natural choice for this problem, so I tried the following:

c = re.sub("Text1(.*?)(?!=\!)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)

However, it replaces every intermediate text with Text4. That is, I get the following output:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text4 Text3

How can I resolve this?

Note the 1st capturing group, (.*?), matches all characters including !, before Text3, thus the negative lookahead won't have any remaining characters left to prevent the match. — DarrylG
– DarrylG, Commented Dec 25, 2022 at 4:09
I would go with this "(?<=Text1)([^!]*?)(?=Text3)"g, look around for our boundaries and capture everything between that not includes !. regex101.com/r/o97cBR/1 — Buttered_Toast
– Buttered_Toast, Commented Dec 25, 2022 at 12:58

Daksh · Accepted Answer · 2022-12-25 04:00:27Z

1

You don't really need a negative lookahead to achieve your results. Matching anything except ! character would do just fine. Modifying your regex as follows fixes the issue:

c = re.sub("Text1([^\!]*?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)

You can play with it online here and understand more about the regex here.

answered Dec 25, 2022 at 4:00

Daksh

4892 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Clement Yung Over a year ago

Thank you. Can this approach be generalised to substrings? For instance, replace all intermediate text unless it contains the substring abcde.

Daksh Over a year ago

For strings, negative lookahead is the go to. So something like /Text1((?!.*?abcd).*)Text3/ would do the job. Try it for yourself here.

Tim Biegeleisen · Accepted Answer · 2022-12-25 03:58:39Z

1

I would phrase this as:

import re

c = """Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3"""

c = re.sub("^Text1(?: [^\s!]+)+ Text3$", "Text1 Text4 Text3", c, flags=re.M)
print(c)

This prints:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3

Here is an explanation of the regex pattern used:

^ from the start of the line (re.M is multiline mode)
Text1 match "Text1"
(?: [^\s!]+)+ then match one or more non whitespace terms NOT containing !
Text3 match space and "Text3"
$ end of the line

answered Dec 25, 2022 at 3:58

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

2 Comments

Clement Yung Over a year ago

Thanks Tim. Can you advise on how can generalise this approach to substring instead of characters? For instance, replace all intermediate texts unless it contains the string abcde.

Tim Biegeleisen Over a year ago

You may use this pattern: ^Text1(?: (?!\S*abcde)\S+)+ Text3$ ... here is a regex demo

Collectives™ on Stack Overflow

Replace string between two strings unless it contains a substring

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related