Regular Expression replacement in Python

Question

I have a regular expression to match all instances of 1 followed by a letter. I would like to remove all these instances.

EXPRESSION = re.compile(r"1([A-Z])")

I can use re.split.

result = EXPRESSION.split(input)

This would return a list. So we could do

result = ''.join(EXPRESSION.split(input))

to convert it back to a string.

or

result = EXPRESSION.sub('', input)

Are there any differences to the end result?

Do you have any cases where you suspect there might be a difference? — mkrieger1
– mkrieger1, Commented Jun 12, 2020 at 9:31
Yes sorry ''.join(..) would make more sense! I don't but I am not too familiar with re and would like to make sure I'm not overlooking something. — user7692855
– user7692855, Commented Jun 12, 2020 at 9:33
There might be a difference in performance, but not in the result. — Błotosmętek
– Błotosmętek, Commented Jun 12, 2020 at 9:33

mkrieger1 · Accepted Answer · 2020-06-12 09:49:28Z

2

Yes, the results are different. Here is a simple example:

import re

EXPRESSION = re.compile(r"1([A-Z])")

s = 'hello1Aworld'

result_split = ''.join(EXPRESSION.split(s))
result_sub = EXPRESSION.sub('', s)

print('split:', result_split)
print('sub:  ', result_sub)

Output:

split: helloAworld
sub:   helloworld

The reason is that because of the capture group, EXPRESSION.split(s) includes the A, as noted in the documentation:

re.split = split(pattern, string, maxsplit=0, flags=0)

Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.

When removing the capturing parentheses, i.e., using

EXPRESSION = re.compile(r"1[A-Z]")

then so far I have not found a case where result_split and result_sub are different, even after reading this answer to a similar question about regular expressions in JavaScript, and changing the replacement string from '' to '-'.

edited Jun 12, 2020 at 9:49

answered Jun 12, 2020 at 9:42

mkrieger1

24.2k7 gold badges68 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user7692855 Over a year ago

Very interesting thanks. So removing those parentheses is very important? Shows you how much I know about re.

Collectives™ on Stack Overflow

Regular Expression replacement in Python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related