Python regex exclude Underscore

Question

I need to find all two-char sumbols in UNICODE, except underscore. Current solutin is:

pattern = re.compile(ur'(?:\s*)(\w{2})(?:\s*)', re.UNICODE | re.MULTILINE | re.DOTALL)
print pattern.findall('a b c ab cd vs sd a a_ _r')
['ab', 'cd', 'vs', 'sd', 'a_', '_r']

I need to exclude underscore _ from regex, so a_ AND _r are not found. The problem is, my characters can be in any language. So i can't use regex like this: [^a-zA-Z]. For example, in russian:

print pattern.findall(u'ф_')

Ioan Alexandru Cucu · Accepted Answer · 2012-09-25 19:35:19Z

12

Exclude anything that's a non-word char AND _

[^\W_]

instead of

\w

answered Sep 25, 2012 at 19:35

Ioan Alexandru Cucu

12.4k7 gold badges41 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Martijn Pieters · Accepted Answer · 2012-09-25 19:33:05Z

9

Your best bet would be to use the new regex module instead. One of it's features is that it can remove characters from a character set:

import regex as re

pattern = re.compile(ur'(?:\s*)([\w--_]{2})(?:\s*)', re.UNICODE | re.MULTILINE | re.DOTALL)

The [\w--_] syntax creates a character set that is the same as \w with the underscore character removed from the matching characters.

answered Sep 25, 2012 at 19:33

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Comments

musca999 · Accepted Answer · 2018-04-30 17:12:03Z

0

This seems to work for me:

a="Exclude_from_search"
re.search("(\w[^_]+)", a).group(0)
'Exclude'

answered Apr 30, 2018 at 17:12

musca999

3814 silver badges13 bronze badges

Collectives™ on Stack Overflow

Python regex exclude Underscore

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related