Replace An Exact Grouped Part in RegEx Python

Question

I have a template that I need to replace a part of that using Regex in Python. Here is my template: (Note that there is at least a new line between two comments)

hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here

I want to replace everything between  and  in Python. So I made \n([^;]*)\n pattern but it includes  and  too.

Here is what I want:

re.sub('...', 'foo', message)

# expected result:
hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

Thanks.

Create a capturing group within your pattern. See stackoverflow.com/questions/48719537/… — PM 77-1
– PM 77-1, Commented Mar 31, 2022 at 15:05

CodeMonkey · Accepted Answer · 2022-03-31 17:35:52Z

2

You can use a capture group for the start and end markers and reference those as \1, \2, etc in the target replacement string.

If the text has multiple occurrences of ... then the regexp with .*? will replace each of those groups. If the '?' is removed the regexp then it will remove all text from the start of the first group to the end of the last group.

Try this:

import re

s = '''
hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here
'''

# for multi-line matching need extra flags in the regexp
s = re.sub(r'(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'\1foo\2', s, flags=re.DOTALL)

# this inlines the DOTALL flag in the regexp for same result
# s = re.sub(r'(?s)(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'\1foo\2', s)

print(s)

Output:

hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

edited Mar 31, 2022 at 17:35

answered Mar 31, 2022 at 15:08

CodeMonkey

23.9k4 gold badges38 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sadra Over a year ago

Thank you so much. What if I have nothing but a \n between two comment lines?

Sadra Over a year ago

Also, I tried your pattern with multiple lines between the two comment lines but didn't work!

CodeMonkey Over a year ago

Add flags=re.MULTILINE | re.DOTALL parameter to the re.sub() call.

fsimonjetz Over a year ago

Correct me if I'm wrong, but re.DOTALL seems to be all you need: re.MULTILINE only alters the behavior of ^ and $, so it should have no effect here.

CodeMonkey Over a year ago

@fsimonjetz - yes since . is in regexp and want to include '\n' then only re.DOTALL flag is needed.

Nikappa_ · Accepted Answer · 2022-03-31 15:32:57Z

1

check this https://docs.python.org/3/library/re.html

import re

pattern = r"(<!--POSTS:START-->\n).*(\n<!--POSTS:END-->)"
string = """hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here"""
result = re.sub(pattern, r"\g<1>foo\g<2>", string)
print(result)

result:

hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

answered Mar 31, 2022 at 15:32

Nikappa_

1274 bronze badges

Comments

CyDevos · Accepted Answer · 2022-03-31 16:38:47Z

1

you can use the following:

import re

new_content = re.sub(
    r'(<!--POSTS:START-->\n).*?(?=\n<!--POSTS:END-->)', r"\1foo",
    content, flags=re.DOTALL)

The flags DOTALL: Make the '.' special character matches any character at all, including a newline.

I'm using two things to do what you want

Group lookahead "?=": Asserts that the given subpattern can be matched here, without consuming characters
Non greedy match pattern (*?). This will match in a non greedy mode. This way we get all patterns separatly

As we are using lookahead, \n will not be consumed so I only need to keep the first group and rewrite the content between the matches. That is why I'm using \1foo and not \1foo\2

If you need to modify only the first match you can use count=1

re.sub(..., count=1)

You can have anything between those two lines and it will work as expected

edited Mar 31, 2022 at 16:38

answered Mar 31, 2022 at 15:20

CyDevos

4416 silver badges15 bronze badges

2 Comments

Sadra Over a year ago

Thank you. What if I have a few more of those two-line comments? It actually removed the first matched and applies changes to the second one,

CyDevos Over a year ago

Hi, I figure it out. I will edit my post. Please mark as answer if it fit your needs

Collectives™ on Stack Overflow

Replace An Exact Grouped Part in RegEx Python

3 Answers 3

5 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related