1

I have a multiline string with three of the following lines of the following form:

Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3

I wish to replace all texts between Text1 and Text3 with Text4, unless the intermediate text contains the character !. Thus, the desired output is:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3

Let c be the multiline string above. I believe re.sub is the natural choice for this problem, so I tried the following:

c = re.sub("Text1(.*?)(?!=\!)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)

However, it replaces every intermediate text with Text4. That is, I get the following output:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text4 Text3

How can I resolve this?

2
  • Note the 1st capturing group, (.*?), matches all characters including !, before Text3, thus the negative lookahead won't have any remaining characters left to prevent the match. Commented Dec 25, 2022 at 4:09
  • I would go with this "(?<=Text1)([^!]*?)(?=Text3)"g, look around for our boundaries and capture everything between that not includes !. regex101.com/r/o97cBR/1 Commented Dec 25, 2022 at 12:58

2 Answers 2

1

You don't really need a negative lookahead to achieve your results. Matching anything except ! character would do just fine. Modifying your regex as follows fixes the issue:

c = re.sub("Text1([^\!]*?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)

You can play with it online here and understand more about the regex here.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. Can this approach be generalised to substrings? For instance, replace all intermediate text unless it contains the substring abcde.
For strings, negative lookahead is the go to. So something like /Text1((?!.*?abcd).*)Text3/ would do the job. Try it for yourself here.
1

I would phrase this as:

import re

c = """Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3"""

c = re.sub("^Text1(?: [^\s!]+)+ Text3$", "Text1 Text4 Text3", c, flags=re.M)
print(c)

This prints:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3

Here is an explanation of the regex pattern used:

  • ^ from the start of the line (re.M is multiline mode)
  • Text1 match "Text1"
  • (?: [^\s!]+)+ then match one or more non whitespace terms NOT containing !
  • Text3 match space and "Text3"
  • $ end of the line

2 Comments

Thanks Tim. Can you advise on how can generalise this approach to substring instead of characters? For instance, replace all intermediate texts unless it contains the string abcde.
You may use this pattern: ^Text1(?: (?!\S*abcde)\S+)+ Text3$ ... here is a regex demo

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.