0

I have a list of characters that I want to find in a string and replace its multiple occurances together into just one occurance.

But I am facing 2 problems - when i loop over them, the re.sub function does not replace the multiple occurances and when i have a smiley like :) it replaces ':' with ':)' which I dont want.

Here is the code that I tried.

end_of_line_chars = [".",";","!",":)",":-)","=)",":]",":-(",":(",":[","=(",":P",":-P",":-p",":p","=P"]
for i in end_of_line_chars:
    pattern = "[" + i + "]" + "+"
    str = re.sub(pattern,i,str)

If I take a single character and try it works as shown below.

str = re.sub("[.]+",".",str)

But looping over a list of characters gives error. How to solve these 2 problems? Thanks for the help.

2
  • regex has special characters that you have to escape with a \` in order to actually match those characters. i would suggest using string.replace() method instead, why bother with regex Commented Oct 30, 2015 at 0:06
  • str.replace won't replace a variable length run of a specific character with a single replacement. Commented Oct 30, 2015 at 0:09

2 Answers 2

1

re.escape(str) does the escaping for you. Separated with | you can match alternatives. With (?:…) you do grouping without capturing. So:

# Only in Python2:
from itertools import imap as map, ifilter as filter

# Escape all elements for, e.g. ':-)' → r'\:\-\)':
esc = map(re.escape, end_of_line_chars)
# Wrap elements in capturing as group, so you know what element what found,
# and in a non-capturing group with repeats and optional trailing spaces:
esc = map(r'(?:({})\s*)+'.format, esc)
# Compile expressing what finds any of these elements:
esc = re.compile('|'.join(esc))

# The function to turn a match of repeats into a single item:
def replace_with_one(match):
    # match.groups() has captures, where only the found one is truthy: ()
    # e.g. (None, None, None, None, ':-)', None, None, None, None, None, None, None, None, None, None, None)
    return next(filter(bool, match.groups()))

# This is how you use it:
esc.sub(replace_with_one, '.... :-) :-) :-) :-( .....')
# Returns: '.:-):-(.'
Sign up to request clarification or add additional context in comments.

1 Comment

Hi Kay, Thanks for the answer. But you could tell what do to after getting the esc variable? I am still not able to proceed with that.
0

If the things to replace are not single characters, character classes won't work. Instead, use non-capture groups (and use re.escape so the literals aren't interpreted as regex special characters):

end_of_line_chars = [".",";","!",":)",":-)","=)",":]",":-(",":(",":[","=(",":P",":-P",":-p",":p","=P"]
for i in end_of_line_chars:
    pattern = r"(?:{})+".format(re.escape(i))
    str = re.sub(pattern,i,str)

1 Comment

Hi Shadow Ranger, Thanks for the answer. But your snippet does not give the output I want. Say str contains ":) :) :) :) :) :)" , I want str to change to ":)"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.