4

I have a regex to strip the end off a request url:

re.sub('(?:^\/en\/category).*(-\d{1,4}$)', '', r)

My problem is that the docs say it will replace the matched part, however when it matches my string it replaces the whole string, e.g.:

/en/category/specials/men-2610

I'm not sure what Python is doing, but my regex seems fine

EDIT: I wish to have the string with the end stripped off, target =

/en/category/specials/men
4
  • 2
    define your pattern as raw string. Commented Jan 16, 2015 at 11:31
  • 1
    yep, it replaces the whole string because the whole string is matched. Commented Jan 16, 2015 at 11:32
  • 2
    What's your expected output? Commented Jan 16, 2015 at 11:33
  • what do you want to remove? Commented Jan 16, 2015 at 11:33

5 Answers 5

3

As stated in the docs, the matched part is replaced. Matched is different from captured.

You will have to capture the text you don't want to remove in a capture group like so:

(^/en/category.*)-\d{1,4}$

and put it back into the string using the backreference \1:

re.sub(r'(^/en/category.*)-\d{1,4}$', r'\1', text)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for explaining about matched vs captured. I was not aware of how it was handled.
2
(?<=^\/en\/category)(.*)-\d{1,4}$

Try this.replace by \1.See demo.

https://regex101.com/r/tX2bH4/27

Your whole pattern matches that is why it is replacing the whole string.

P.S match is different than captures or groups.

import re
p = re.compile(r'(?<=^\/en\/category)(.*)-\d{1,4}$', re.IGNORECASE)
test_str = "/en/category/specials/men-2610"
subst = "\1"

result = re.sub(p, subst, test_str)

Comments

1
>>> re.sub('(^\/en\/category.*)(-\d{1,4}$)', 
           r'\1', '/en/category/specials/men-2610')
'/en/category/specials/men'

Comments

1

Just transfer the capturing group to the other part and then replace the match with \1 and you don't need to escape the forward slash if the pattern is defined as a raw string.

re.sub(r'^(/en/category.*)-\d{1,4}$', r'\1', string)

DEMO

>>> s = "/en/category/specials/men-2610"
>>> re.sub(r'^(/en/category.*)-\d{1,4}$', r'\1', s)
'/en/category/specials/men'

OR

>>> s.split('-')[0]
'/en/category/specials/men'

Comments

1

Your pattern is fine, you just need to change which item is the capturing group:

Before:

(?:^\/en\/category).*(-\d{1,4}$)

After:

((?:^\\/en\\/category).*)-\\d{1,4}$

Since the ?: is no longer necessary we can reduce this further to:

(^\\/en\\/category.*)-\\d{1,4}$

Notice I've moved the capturing group from the digits to the part before it.

Example:

http://ideone.com/FLAaFh

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.