0

I have a regular expression to match all instances of 1 followed by a letter. I would like to remove all these instances.

EXPRESSION = re.compile(r"1([A-Z])")

I can use re.split.

result = EXPRESSION.split(input)

This would return a list. So we could do

result = ''.join(EXPRESSION.split(input))

to convert it back to a string.

or

result = EXPRESSION.sub('', input)

Are there any differences to the end result?

4
  • 1
    Did you mean result = ''.join(...)? Commented Jun 12, 2020 at 9:30
  • Do you have any cases where you suspect there might be a difference? Commented Jun 12, 2020 at 9:31
  • Yes sorry ''.join(..) would make more sense! I don't but I am not too familiar with re and would like to make sure I'm not overlooking something. Commented Jun 12, 2020 at 9:33
  • There might be a difference in performance, but not in the result. Commented Jun 12, 2020 at 9:33

1 Answer 1

2

Yes, the results are different. Here is a simple example:

import re

EXPRESSION = re.compile(r"1([A-Z])")

s = 'hello1Aworld'

result_split = ''.join(EXPRESSION.split(s))
result_sub = EXPRESSION.sub('', s)

print('split:', result_split)
print('sub:  ', result_sub)

Output:

split: helloAworld
sub:   helloworld

The reason is that because of the capture group, EXPRESSION.split(s) includes the A, as noted in the documentation:

re.split = split(pattern, string, maxsplit=0, flags=0)

Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.


When removing the capturing parentheses, i.e., using

EXPRESSION = re.compile(r"1[A-Z]")

then so far I have not found a case where result_split and result_sub are different, even after reading this answer to a similar question about regular expressions in JavaScript, and changing the replacement string from '' to '-'.

Sign up to request clarification or add additional context in comments.

1 Comment

Very interesting thanks. So removing those parentheses is very important? Shows you how much I know about re.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.