Python regex remove double character from string [duplicate]

Question

I'm trying to remove all non-characters and all characters that follow the same characters from a string.

The example input "ABBBbbcCCCD EF ZZZU" should become "ABCDDEFZU". In the filter2 function I try to capture a two-letter pair, so that I can compare the two letters. But I only match the real matched letter, not the look behind letter.

#!/usr/bin/python
# coding: latin-1

import re
testfield = 'ABBBbbcCCCD EF  ZZZU'
def filter1(matchobj):
  return ''
def filter2(matchobj):
  print('MATCH:' + matchobj.group(0))
  return matchobj.group(0)

print(testfield)

testfield2 = re.sub('[^A-Z0-9]', filter1, testfield, flags=re.IGNORECASE)
print(testfield2)

testfield2 = re.sub('[A-Z0-9](?=[A-Z0-9])', filter2, testfield2, flags=re.IGNORECASE)

How do I pass both letters to the filter2 function but still find all possible matches? print(testfield2)

How this input "ABBBbbcCCCD EF ZZZU" will become "ABCDDEFZU", from where the 2 D come from. — James Sapam
– James Sapam, Commented Feb 2, 2014 at 11:32
@yoyi - because that was done manually... (I took the D out) — 576i
– 576i, Commented Feb 2, 2014 at 11:49
@cherhan - I'm trying to find out how to get the two letters into the "filter2" function... — 576i
– 576i, Commented Feb 2, 2014 at 11:51

Jerry · Accepted Answer · 2014-02-02 11:49:49Z

1

You should raw your regex patterns. And second (assuming you meant ABCDEFZU as end result), using backreferences and a lambda function to return the uppercase letter of the matched part, you can do:

testfield2 = re.sub(r'([A-Z0-9])\1+', lambda m: m.group(1).upper(), testfield2, flags=re.IGNORECASE)
print(testfield2)

ideone demo

answered Feb 2, 2014 at 11:49

Jerry

71.8k14 gold badges106 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

576i Over a year ago

Regarding rawing I have not found out how to raw a pattern when the pattern in stored in a string. like pattern="([A-Z0-9])\1+" testfield=re.sub(pattern...

Jerry Over a year ago

@576i It's just the same. You do pattern = r"([A-Z0-9])\1+"

Collectives™ on Stack Overflow

Python regex remove double character from string [duplicate]

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related