finding email address in a web page using regular expression

Question

I'm a beginner-level student of Python. Here is the code I have to find instances of email addresses from a web page.

    page = urllib.request.urlopen("http://website/category")
    reg_ex = re.compile(r'[-a-z0-9._]+@([-a-z0-9]+)(\.[-a-z0-9]+)+', re.IGNORECASE
    m = reg_ex.search_all(page)
    m.group()

When I ran it, the Python module said that there is an invalid syntax and it is on the line:

    m = reg_ex.search_all(page)

Would anyone tell me why it is invalid?

TommyOKe · Accepted Answer · 2014-07-21 17:33:22Z

6

Consider an alternative:

## Suppose we have a text with many email addresses
str = 'purple [email protected], blah monkey [email protected] blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) 
    ## ['[email protected]', '[email protected]']    
for email in emails:
    # do something with each found email string
    print email

Source: https://developers.google.com/edu/python/regular-expressions

edited Jul 21, 2014 at 17:33

answered Jul 21, 2014 at 16:58

TommyOKe

1291 gold badge2 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

honk Over a year ago

This might be the solution the OP is looking for, but it does not answer his question...

TommyOKe Over a year ago

So if the OP asks a question where he is trying to get a certain output and asks why his code doesn't work, I am only supposed to tell him why his code doesn't work and not give him a better solution?

takendarkk Over a year ago

No, do both. Explain why his didn't work then provide a solution and explain why it does work.

TommyOKe Over a year ago

It was explained 4 times why his doesn't work, so I didn't want to be redundant.

Anass Over a year ago

this regex can also match invalid email like name@example without ltd extention.

zhangyangyu · Accepted Answer · 2013-08-08 07:19:18Z

2

Besides, reg_ex has no search_all method. And you should pass in page.read().

answered Aug 8, 2013 at 7:19

zhangyangyu

8,6103 gold badges35 silver badges43 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:32:17Z

2

You don't have closing ) at this line:

reg_ex = re.compile(r'[a-z0-9._]+@([-a-z0-9]+)(\.[-a-z0-9]+)+', re.IGNORECASE)

Plus, your regex is not valid, try this instead:

"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"

FYI, validating email using regex is not that trivial, see these threads:

edited May 23, 2017 at 12:32

CommunityBot

11 silver badge

answered Aug 8, 2013 at 7:17

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

2 Comments

stema Over a year ago

Your suggested regex makes no sense in this use case. The OP wants to find an email address in a bunch of text, so the anchors are wrong here.

alecxe Over a year ago

@stema ok, it was just an example, but correct, no need to put boundaries.

Kadmillos · Accepted Answer · 2013-08-08 08:54:18Z

there is no .search_all method with the re module

maybe theone you are looking for is .findall

you can try

re.findall(r"(\w(?:[-.+]?\w+)+\@(?:[a-zA-Z0-9](?:[-+]?\w+)*\.)+[a-zA-Z]{2,})", text)

i assume text is the text to search, in your case should be text = page.read()

or you need to compile the regex:

r = re.compile(r"(\w(?:[-.+]?\w+)+\@(?:[a-z0-9](?:[-+]?\w+)*\.)+[a-z]{2,})", re.I)
results = r.findall(text)

Note: .findall returns a list of matches

if you need to iterate to get a match object, you can use .finditer

(from the example before)

r = re.compile(r"(\w(?:[-.+]?\w+)+\@(?:[a-z0-9](?:[-+]?\w+)*\.)+[a-z]{2,})", re.I)
for email_match in r.finditer(text):
    email_addr = email_match.group() #or anything you need for a matched object

Now the problem is what Regex you have to use :)

Prahalad Deshpande · Accepted Answer · 2013-08-08 07:17:31Z

0

Change r'[-a-z0-9._]+@([-a-z0-9]+)(\.[-a-z0-9]+)+' to r'[aA-zZ0-9._]+@([aA-zZ0-9]+)(\.[aA-zZ0-9]+)+'. The - character before a-z is the cause

answered Aug 8, 2013 at 7:17

Prahalad Deshpande

4,7871 gold badge23 silver badges23 bronze badges

Collectives™ on Stack Overflow

finding email address in a web page using regular expression

5 Answers 5

5 Comments

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related