Python regex replace whole string

Question

I have a regex to strip the end off a request url:

re.sub('(?:^\/en\/category).*(-\d{1,4}$)', '', r)

My problem is that the docs say it will replace the matched part, however when it matches my string it replaces the whole string, e.g.:

/en/category/specials/men-2610

I'm not sure what Python is doing, but my regex seems fine

EDIT: I wish to have the string with the end stripped off, target =

/en/category/specials/men

yep, it replaces the whole string because the whole string is matched. — Avinash Raj
– Avinash Raj, Commented Jan 16, 2015 at 11:32

Aran-Fey · Accepted Answer · 2015-01-16 11:37:53Z

3

As stated in the docs, the matched part is replaced. Matched is different from captured.

You will have to capture the text you don't want to remove in a capture group like so:

(^/en/category.*)-\d{1,4}$

and put it back into the string using the backreference \1:

re.sub(r'(^/en/category.*)-\d{1,4}$', r'\1', text)

answered Jan 16, 2015 at 11:37

Aran-Fey

44.1k13 gold badges113 silver badges161 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tjorriemorrie Over a year ago

Thanks for explaining about matched vs captured. I was not aware of how it was handled.

vks · Accepted Answer · 2015-01-16 11:44:36Z

2

(?<=^\/en\/category)(.*)-\d{1,4}$

Try this.replace by \1.See demo.

https://regex101.com/r/tX2bH4/27

Your whole pattern matches that is why it is replacing the whole string.

P.S match is different than captures or groups.

import re
p = re.compile(r'(?<=^\/en\/category)(.*)-\d{1,4}$', re.IGNORECASE)
test_str = "/en/category/specials/men-2610"
subst = "\1"

result = re.sub(p, subst, test_str)

edited Jan 16, 2015 at 11:44

answered Jan 16, 2015 at 11:35

vks

68.1k11 gold badges96 silver badges132 bronze badges

Comments

jamylak · Accepted Answer · 2015-01-16 11:36:15Z

1

>>> re.sub('(^\/en\/category.*)(-\d{1,4}$)', 
           r'\1', '/en/category/specials/men-2610')
'/en/category/specials/men'

answered Jan 16, 2015 at 11:36

jamylak

135k30 gold badges238 silver badges240 bronze badges

Comments

Avinash Raj · Accepted Answer · 2015-01-16 11:42:38Z

1

Just transfer the capturing group to the other part and then replace the match with \1 and you don't need to escape the forward slash if the pattern is defined as a raw string.

re.sub(r'^(/en/category.*)-\d{1,4}$', r'\1', string)

DEMO

>>> s = "/en/category/specials/men-2610"
>>> re.sub(r'^(/en/category.*)-\d{1,4}$', r'\1', s)
'/en/category/specials/men'

OR

>>> s.split('-')[0]
'/en/category/specials/men'

edited Jan 16, 2015 at 11:42

answered Jan 16, 2015 at 11:35

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Comments

l'L'l · Accepted Answer · 2015-01-16 11:58:12Z

1

Your pattern is fine, you just need to change which item is the capturing group:

Before:

(?:^\/en\/category).*(-\d{1,4}$)

After:

((?:^\\/en\\/category).*)-\\d{1,4}$

Since the ?: is no longer necessary we can reduce this further to:

(^\\/en\\/category.*)-\\d{1,4}$

Notice I've moved the capturing group from the digits to the part before it.

Example:

http://ideone.com/FLAaFh

edited Jan 16, 2015 at 11:58

answered Jan 16, 2015 at 11:51

l'L'l

47.5k12 gold badges102 silver badges154 bronze badges

Collectives™ on Stack Overflow

Python regex replace whole string

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related