Python : remove duplicate character

Question

string input: Tem1 = 'Hhelloo ookkee'

I want to make output like Tem1 = 'helo oke'

I have try this link form stackoverflow (Python: Best Way to remove duplicate character from string)

I've tried using itertools, but when saving in csv. the stored format is still the same with lots of duplicate characters

import itertools
tem1 = sum(val*(2**idx) for idx, val in enumerate(reversed(tem)))
if bit[0:8]==[1,0,0,1,1,0,0,1]:
    cv2.putText(frame, "Text Print: " + chr(tem1) +".....", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    print(chr(tem1))
    cv2.imshow('frame',frame)
if str(tem1)!='0':
    row = ''.join(ch for ch, _ in itertools.groupby(f'{chr(tem1)}'))
    # create csv file to save the data.
    f.write(row)

best way to remove duplicate

NOTE: Order is important and this question is not similar to this one.

What about the result of join? Is that string correct? Why are you storing the result in row but in your csv you are writing newrow? — Jorge Luis
– Jorge Luis, Commented Mar 23, 2023 at 7:53
I don't understand your call to groupby. It should receive the string whose duplicate characters have to be removed. Instead you are passing a string with a single character (which will never have a double character). — Jorge Luis
– Jorge Luis, Commented Mar 23, 2023 at 7:57
@JorgeLuis sorry, i have updated "newrow" into "row". I have try this one with f.write(row),but the result is still duplicate when I save into csv file — 0nespo
– 0nespo, Commented Mar 24, 2023 at 6:38
without a reproducible example is really hard to tell from the code you posted what you are trying to achieve because you are doing some weird stuff. — Jorge Luis
– Jorge Luis, Commented Mar 24, 2023 at 7:48

Headcrab · Accepted Answer · 2023-04-11 18:28:08Z

0

def remove_duplicates(s):
    acc = [s[0]]
    for c in s[1:]:
        if acc[-1] != c: acc.append(c)
    return ''.join(acc)

s = "Hhelloo ookkee"
print(remove_duplicates(s))

There's also a module called more-itertools (install with pip install more-itertools), that has a unique_justseen function which seems to do the same thing:

from more_itertools import unique_justseen

s = "Hhelloo ookkee"
print(''.join(unique_justseen(s)))

The output would be 'Hhelo oke', because 'H' and 'h' are, strictly speaking, different characters. If you want the comparison to be case-insensitive, you should lowercase the symbols before comparing. For simple examples with strings limited to Latin alphabet calling str.lower() would be enough, but it wouldn't work for some Unicode characters, therefore, for real stuff, casefold() should be used instead; read this about even more real stuff. E. g., for the first of the above code samples:

if acc[-1].casefold() != c.casefold(): acc.append(c)

And for the second, using the optional key argument:

unique_justseen(s, str.casefold)

And in both cases it would probably be more efficient to casefold the entire string first, not to do it character by character when comparing.

edited Apr 11, 2023 at 18:28

answered Mar 24, 2023 at 1:02

Headcrab

7,2438 gold badges45 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Andj Over a year ago

technically str.lower() and str.upper() can not give you case-insensitive comparison. These operations are asymmetric casing operations, not matching or comparison operations, neither do they remove all case distinctions.

Headcrab Over a year ago

@Andj Huh? "Hello, World!".lower() == "hello, world!".lower() -> True

Andj Over a year ago

The fact that "Hello, World!".lower() == "hello, world!".lower() is true is irrelevant. Not all lowercase characters have uppercase equivalents, not all uppercase characters have lowercase equivalents. Some uppercase characters map to two characters when lowercased. Also with casing, two types of casing are defined in Unicode: simple casing and full casing. @headcrab Unicode defines four types of caseless matching, the simplest is case-folding. str.lower() and str.upper() were only ever caseless matching in Python 2. I.e. when using encodings other than Unicode.

Headcrab Over a year ago

@Andj I see no point in overloading such a simple question with endless Unicode intricacies, but I've added str.casefold() and a link for further reading into the answer. OK now, or do you still see some room for exercises in pedantry?

Andj Over a year ago

your answer works perfectly well for the question, I didn't imply that. Rather, I was pointing out that your characterisation of the str.lower() operation was technically incorrect and a hangover from Python 2.x. For the question being answered, it doesn't matter, true. But stackoverflow questions and answers are often searched after the fact, in fact people asking questions are encouraged to search first, ask if they can't find the answer already. So it is better to be exact for future readers rather than having partial answers where the distinctions matter.

Collectives™ on Stack Overflow

Python : remove duplicate character

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related