Here goes my attempt:
([a-zA-Z])(?!\1)([a-zA-Z])\2\1
Assuming you want to match letters only (if other ranges, change both [a-zA-Z] as appropriate, we have:
([a-zA-Z])
Find the first character, and note it so we can later refer to it with \1.
(?!\1)
Check to see if the next character is not the same as the first, but without advancing the search pointer. This is to prevent aaaa being accepted. If aaaa is OK, just remove this subexpression.
([a-zA-Z])
Find the second character, and note it so we can later refer to it with \2.
\2\1
Now find the second again, then the first again, so we match the full abba pattern.
And finally, to do a replace operation, the full command would be:
import re
re.sub(r'([a-zA-Z])(?!\1)([a-zA-Z])\2\1',
'1234',
'abbacdeffelzzzz')
The r at the start of the regex pattern is to prevent Python processing the backslashes. Without it, you would need to do:
import re
re.sub('([a-zA-Z])(?!\\1)([a-zA-Z])\\2\\1',
'1234',
'abbacdeffelzzzz')
Now, I see the spec has expanded to a user-defined pattern; here is some code that will build that pattern:
import re
def make_re(pattern, charset):
result = ''
seen = []
for c in pattern:
# Is this a letter we've seen before?
if c in seen:
# Yes, so we want to match the captured pattern
result += '\\' + str(seen.index(c)+1)
else:
# No, so match a new character from the charset,
# but first exclude already matched characters
for i in xrange(len(seen)):
result += '(?!\\' + str(i + 1) + ')'
result += '(' + charset + ')'
# Note we have seen this letter
seen.append(c)
return result
print re.sub(make_re('xzzx', '\\d'), 'abba', 'abba1221b99999889')
print re.sub(make_re('xyzxyz', '[a-z]'), '123123', 'abcabc zyxzyyx zyzzyz')
Outputs:
abbaabbab9999abba
123123 zyxzyyx zyzzyz
'adda'fit your pattern ('xyyx')?, does case matter?