-2

How i can delete specific duplicated characters from a string only if they goes one after one in Python? For example:

A have string

string = "Hello _my name is __Alex"

I need to delete duplicate _ only if they goes one after one __ and get string like this:

string = "Hello _my name is _Alex"

If i use set i got this:

string = "_yoiHAemnasxl"
5
  • itertools.groupby probably the way to go Commented Apr 6, 2018 at 14:56
  • What do you mean by "specific characters"? Commented Apr 6, 2018 at 14:57
  • @Jean-François Fabre not sure if this is a dupe of that question- in this question, OP wants to keep one of the duplicated chars where in the dupe they remove all of them. Something like RomanPerekhrest's answer here is different and useful, IMO. Commented Apr 6, 2018 at 15:09
  • 2
    Why does "_" get de-duplicated but not the two Ls in "Hello"? What criteria are we using to decide what should be deleted? Commented Apr 6, 2018 at 15:12
  • I undup-ed this since Remove consecutive duplicate characters from a string in python wants to remove all characters if they're duplicated, this one just wants to reduce them to a single copy; the differences in how you solve that are enough to make many answers to the not-a-duplicate a poor fit. Commented Nov 12, 2021 at 15:37

2 Answers 2

5

(Big edit: oops, I missed that you only want to de-deuplicate certain characters and not others. Retrofitting solutions...)

I assume you have a string that represents all the characters you want to de-duplicate. Let's call it to_remove, and say that it's equal to "_.-". So only underscores, periods, and hyphens will be de-duplicated.

You could use a regex to match multiple successive repeats of a character, and replace them with a single character.

>>> import re
>>> to_remove = "_.-"
>>> s = "Hello... _my name -- is __Alex"
>>> pattern = "(?P<char>[" + re.escape(to_remove) + "])(?P=char)+"
>>> re.sub(pattern, r"\1", s)
'Hello. _my name - is _Alex'

Quick breakdown:

  • ?P<char> assigns the symbolic name char to the first group.
  • we put to_remove inside the character matching set, []. It's necessary to call re.escape because hyphens and other characters may have special meaning inside the set otherwise.
  • (?P=char) refers back to the character matched by the named group "char".
  • The + matches one or more repetitions of that character.

So in aggregate, this means "match any character from to_remove that appears more than once in a row". The second argument to sub, r"\1", then replaces that match with the first group, which is only one character long.


Alternative approach: write a generator expression that takes only characters that don't match the character preceding them.

>>> "".join(s[i] for i in range(len(s)) if i == 0 or not (s[i-1] == s[i] and s[i] in to_remove))
'Hello. _my name - is _Alex'

Alternative approach #2: use groupby to identify consecutive identical character groups, then join the values together, using to_remove membership testing to decide how many values should be added..

>>> import itertools
>>> "".join(k if k in to_remove else "".join(v) for k,v in itertools.groupby(s, lambda c: c))
'Hello. _my name - is _Alex'

Alternative approach #3: call re.sub once for each member of to_remove. A bit expensive if to_remove contains a lot of characters.

>>> for c in to_remove:
...     s = re.sub(rf"({re.escape(c)})\1+", r"\1", s)
...
>>> s
'Hello. _my name - is _Alex'
Sign up to request clarification or add additional context in comments.

Comments

3

Simple re.sub() approach:

import re

s = "Hello _my name is __Alex aa"
result = re.sub(r'(\S)\1+', '\\1', s)

print(result)
  • \S - any non-whitespace character
  • \1+ - backreference to the 1st parenthesized captured group (one or more occurrences)

The output:

Helo _my name is _Alex a

3 Comments

Oh, neat. This is more concise than my answer, since I didn't know you could refer to a previous group without explicitly naming it.
@RomanPerekhrest it turns out I may have been wrong and your first answer may have actually been correct. I regret the error, but this is a great solution!
how to keep "Hello" word?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.