I'm trying to extract any jabber accounts (emails) using regex from this page.
I've tried using regex:
\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+
...but it's not producing the desired results.
I'm trying to extract any jabber accounts (emails) using regex from this page.
I've tried using regex:
\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+
...but it's not producing the desired results.
This might work:
[^\s@<>]+@[^\s@<>]+\.[^\s@<>]+
p = re.compile(ur'[^\s@<>]+@[^\s@<>]+\.[^\s@<>]+', re.MULTILINE | re.IGNORECASE)
test_str = r'...'
re.findall(p, test_str)
See example.
.@... is not a valid adress imho... In general: •Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively. For matching email-adress-like-patterns your attempt is fine.# -*- coding: utf-8 -*-
s = '''
...YOUR HTML page source code HERE..........
'''
import re
reobj = re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}\b", re.IGNORECASE)
print re.findall(reobj, s.decode('utf-8'))
[u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]']
The Official Standard: RFC 5322section and get scared. regex is not a tool for this task.