I'm trying to remove all non-characters and all characters that follow the same characters from a string.
The example input "ABBBbbcCCCD EF ZZZU" should become "ABCDDEFZU". In the filter2 function I try to capture a two-letter pair, so that I can compare the two letters. But I only match the real matched letter, not the look behind letter.
#!/usr/bin/python
# coding: latin-1
import re
testfield = 'ABBBbbcCCCD EF ZZZU'
def filter1(matchobj):
return ''
def filter2(matchobj):
print('MATCH:' + matchobj.group(0))
return matchobj.group(0)
print(testfield)
testfield2 = re.sub('[^A-Z0-9]', filter1, testfield, flags=re.IGNORECASE)
print(testfield2)
testfield2 = re.sub('[A-Z0-9](?=[A-Z0-9])', filter2, testfield2, flags=re.IGNORECASE)
How do I pass both letters to the filter2 function but still find all possible matches? print(testfield2)