I need to find all two-char sumbols in UNICODE, except underscore. Current solutin is:
pattern = re.compile(ur'(?:\s*)(\w{2})(?:\s*)', re.UNICODE | re.MULTILINE | re.DOTALL)
print pattern.findall('a b c ab cd vs sd a a_ _r')
['ab', 'cd', 'vs', 'sd', 'a_', '_r']
I need to exclude underscore _ from regex, so a_ AND _r are not found. The problem is, my characters can be in any language. So i can't use regex like this: [^a-zA-Z]. For example, in russian:
print pattern.findall(u'ф_')