2

In Python, I tried to replace two strings in between a regular expression match.

import re

a = "("
b = ")"
string = "foo bar foo foo bar bar foofoofoo foo foo"

regex = "(foo(.[foo]{1,}))|foo"
print(re.sub(regex, a + string + b, string))

What I thought was going to print:

(foo) bar (foo foo) bar bar (foofoofoo) (foo foo)

What it actually printed:

(foo bar foo foo bar bar foofoofoo foo foo) bar (foo bar foo foo bar bar foofoofoo foo foo) bar bar (foo bar foo foo bar bar foofoofoo foo foo) (foo bar foo foo bar bar foofoofoo foo foo)

Should I use loops or is there a function for that?

3 Answers 3

2

re.sub is defined as

re.sub(pattern , <what to replace "pattern" with>, input_string)

Your code is asking re.sub to replace your matched pattern with your input_string surrounded by parenthesis which is not what you want.

If you just want to replace all foo's with parenthesis, try

regex = r"((foo){2,}|foo(\s+foo)*)"
a = "("
b = ")"
input_string = "foo bar foo foo bar bar foofoofoo foo foo"
print(re.sub(regex, a + r'\1' + b, input_string))

Output:

(foo) bar (foo foo) bar bar (foofoofoo) (foo foo)
Sign up to request clarification or add additional context in comments.

6 Comments

What if I wanted to change the first string and second string to close with user input? It prints () bar () () bar bar () () () when I use variable strings.
Never mind it's a+r'\1'+b sorry. I didn't know what r really does. But I guess when it's for when it's regular expression or it's not.
How do I do so user's regex string input with r at the beginning? It doesn't do anything without it.
If you don't put the r, then make sure you use double backslash instead of single: \\1
I am not talking about without the r in re.sub 's replacement. I am saying that re.sub 's pattern always work when a r is before a string. But what do I do if an user inputs pattern but there isn't a r before it because you can't do 'r'+str(raw_input("Input pattern: ")).
|
2

You're not using a back reference to your original match. Instead you're replacing the match with the original string. This is why your string is getting longer. You need to use \1 to refer to the pattern matched in the first set of brackets.

I used the following code and got the output that you wanted:

print(re.sub(r"((foo){2,}|foo(\s+foo)*)",r'(\1)',string))

EDIT: I don't have the reputation to comment on the answer marked as correct (my account was reset after being dormant). However, the output is wrong based on the original question.

I got the output:

(foo) bar (foo foo) bar bar (foofoofoo) (foo foo)

EDIT: I corrected the original answer. I didn't think that I could with my reputation.

1 Comment

You can always correct what needs correction, and if it is helpful and correct, any reputation is enough :)
1

You need to read the documentation for re.sub again. This is its definition:

re.sub(pattern, repl, string, count=0, flags=0)

You are calling:

re.sub(regex, a + string + b, string)

No matter what matches, you are replacing the match with your entire original string surrounded by parens.

Your regex as written has two capture groups (defined by parens). You can refer to them inside the replacement string as \1 or \2. You need to replace a+string+b with something else that will print what you are looking for instead of the entire string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.