0

I'm trying to remove all non-characters and all characters that follow the same characters from a string.

The example input "ABBBbbcCCCD EF ZZZU" should become "ABCDDEFZU". In the filter2 function I try to capture a two-letter pair, so that I can compare the two letters. But I only match the real matched letter, not the look behind letter.

#!/usr/bin/python
# coding: latin-1

import re
testfield = 'ABBBbbcCCCD EF  ZZZU'
def filter1(matchobj):
  return ''
def filter2(matchobj):
  print('MATCH:' + matchobj.group(0))
  return matchobj.group(0)

print(testfield)

testfield2 = re.sub('[^A-Z0-9]', filter1, testfield, flags=re.IGNORECASE)
print(testfield2)

testfield2 = re.sub('[A-Z0-9](?=[A-Z0-9])', filter2, testfield2, flags=re.IGNORECASE)

How do I pass both letters to the filter2 function but still find all possible matches? print(testfield2)

4
  • stackoverflow.com/questions/17885329/… Commented Feb 2, 2014 at 11:31
  • 1
    How this input "ABBBbbcCCCD EF ZZZU" will become "ABCDDEFZU", from where the 2 D come from. Commented Feb 2, 2014 at 11:32
  • @yoyi - because that was done manually... (I took the D out) Commented Feb 2, 2014 at 11:49
  • @cherhan - I'm trying to find out how to get the two letters into the "filter2" function... Commented Feb 2, 2014 at 11:51

1 Answer 1

1

You should raw your regex patterns. And second (assuming you meant ABCDEFZU as end result), using backreferences and a lambda function to return the uppercase letter of the matched part, you can do:

testfield2 = re.sub(r'([A-Z0-9])\1+', lambda m: m.group(1).upper(), testfield2, flags=re.IGNORECASE)
print(testfield2)

ideone demo

Sign up to request clarification or add additional context in comments.

2 Comments

Regarding rawing I have not found out how to raw a pattern when the pattern in stored in a string. like pattern="([A-Z0-9])\1+" testfield=re.sub(pattern...
@576i It's just the same. You do pattern = r"([A-Z0-9])\1+"

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.