101

Suppose I want to change the blue dog and blue cat wore blue hats to the gray dog and gray cat wore blue hats.

With sed I could accomplish this as follows:

$ echo 'the blue dog and blue cat wore blue hats' | sed 's/blue \(dog\|cat\)/gray \1/g'

How can I do a similar replacement in Python? I've tried:

>>> import re
>>> s = "the blue dog and blue cat wore blue hats"
>>> p = re.compile(r"blue (dog|cat)")
>>> p.sub('gray \1',s)
'the gray \x01 and gray \x01 wore blue hats'
0

4 Answers 4

115

You need to escape your backslash:

p.sub('gray \\1', s)

alternatively you can use a raw string as you already did for the regex:

p.sub(r'gray \1', s)
Sign up to request clarification or add additional context in comments.

2 Comments

Second answer is ideal, as it matches the sed syntax.
The one liner is re.sub("blue (dog|cat)", "gray \\1", s);
43

As I was looking for a similar answer; but wanting using named groups within the replace, I thought I'd add the code for others:

p = re.compile(r'blue (?P<animal>dog|cat)')
p.sub(r'gray \g<animal>',s)

Comments

38

Off topic, For numbered capture groups:

#/usr/bin/env python
import re

re.sub(
    pattern=r'(\d)(\w+)', 
    repl='word: \\2, digit: \\1', 
    string='1asdf'
)

word: asdf, digit: 1

Python uses literal backslash, plus one-based-index to do numbered capture group replacements, as shown in this example. So \1, entered as '\\1', references the first capture group (\d), and \2 the second captured group.

7 Comments

Off topic, but I was wondering if it is possible to replace the captured group, in your case group1 is 1 can we replace group1 to lets say 5 so the final output can be something like 5asdf. (i.e., replacing the entire group)
@anoop If I understand your goal it sounds like you don't want to capture the 1 at all, in that case simply don't capture it (by not enclosing it in parenthesis). If you want to extract strings with regex, use re.match or re.search (and variants), that will give you for example a group dict (docs.python.org/3/library/re.html#re.Match.groupdict) and you can format/parse data from there as you like
@anoop oh, you can also simply not use the capture group (or not capture it at all) and hard code more date into your output string, must like the words "word: ' and ', digit: ' are in the example.
OK let me explain it with an example I had text similar to function public xyzname() and I wanted to change public to private so only way I can do it by grouping function and xyzname() and applying something like \\1 private \\2, but I was wondering if I can group just public as group 1 and replace it with private, is it possible?
@anoop sure that kind of transformation is pretty common in regex use cases, you're on the right track with your question, play with it to see how it work when you apply \\1 private \\2 and adjust it as necessary
|
9

Try this:

p.sub('gray \g<1>',s)

5 Comments

Nice alternative (+1) but it still works only because \g is not a valid escaped code. The safe way of writing your code should still be: p.sub('gray \\g<1>',s)
Sorry, I meant that to be a raw string. I left out the replacement argument, too--I was on a roll! I'm deleting the comment. I agree 100% about not counting on Python's too-permissive behavior with respect to escape sequences.
@mac Consider adding your comment here to your answer. It is the only thing that worked reliably in ipython notebook.
@mac: \g was chosen especifically not to clash with other escape codes. It would've been a poor choice by Python devs if it did. docs.python.org/2/library/re.html#re.sub
You can write p.sub(r'gray \g<1>',s) to prevent the `` being parsed by Python, allowing it to be sent directly to the regex engine.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.