0

I want a python script that opens a link and print the email address from that page.

E.g

  1. Go to some site like example.com
  2. Search for email in that.
  3. Search in all the pages in that link.

I was tried below code

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.digitalseo.in/')
data = r.text
soup = BeautifulSoup(data)

for rate in soup.find_all('@'):
    print rate.text

I take this website for reference.

Anyone help me to get this?

5
  • 1
    Have you tried that? You can use beautifulsoup and requests to do that. Commented Sep 24, 2015 at 6:45
  • Yes. I tried with BeautifulSoup. But i cant get. Commented Sep 24, 2015 at 6:50
  • what is your code? what is the error message? what is the output? Commented Sep 24, 2015 at 6:51
  • import requests from bs4 import BeautifulSoup r = requests.get('digitalseo.in/') data = r.text soup = BeautifulSoup(data) for rate in soup.find_all('@'): print rate.text I did't get any output. I take that website just for reference. Commented Sep 24, 2015 at 6:55
  • Okay, because find_all() function will search the Tags, not email address. I'll post an answer to explain this. And I think you should edit your question and add your code. Commented Sep 24, 2015 at 6:58

1 Answer 1

3

Because find_all() will only search Tags. From document:

Signature: find_all(name, attrs, recursive, string, limit, **kwargs)

The find_all() method looks through a tag’s descendants and retrieves all descendants that match your filters.

So you need add a keyword argument like this:


import re
import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.digitalseo.in/')
data = r.text
soup = BeautifulSoup(data, "html.parser")

for i in soup.find_all(href=re.compile("mailto")):
    print i.string

Demo:

[email protected]
[email protected]


From document:

Any argument that’s not recognized will be turned into a filter on one of a tag’s attributes. If you pass in a value for an argument called id, Beautiful Soup will filter against each tag's 'id' attribute:

soup.find_all(id='link2')
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

If you pass in a value for href, Beautiful Soup will filter against each tag's 'href' attribute:

soup.find_all(href=re.compile("elsie"))
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

You can see the document for more info: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all


And if you'd like find the email address from a document, regex is a good choice.

For example:

import re
re.findall( '[^@]+@[^@]+\.[^@]+ ', text) # remember change `text` variable

And if you'd like find a link in a page by keyword, just use .get like this:

import re
import requests
from bs4 import BeautifulSoup

def get_link_by_keyword(keyword):
    links = set()
    for i in soup.find_all(href=re.compile(r"[http|/].*"+str(keyword))):
        links.add(i.get('href'))

    for i in links:
        if i[0] == 'h':
            yield i
        elif i[0] == '/':
            yield link+i
        else:
            pass

global link
link = raw_input('Please enter a link: ')
if link[-1] == '/':
    link = link[:-1]

r = requests.get(link, verify=True)
data = r.text
soup = BeautifulSoup(data, "html.parser")

for i in get_link_by_keyword(raw_input('Enter a keyword: ')):
    print i
Sign up to request clarification or add additional context in comments.

39 Comments

It's working. Is is possible to find the mail based on the @ symbol.
No, this will search href=mailto. If you check the HTML then you will see some thing like <a href="mailto:[email protected]">.
Is there is any way to find E-mail based on the @ symbol. Because in some situations will be listed without <a href
Yes, , if you'd like search the email address in a document or some string, use regex instead BeautifulSoup. Let me edit my answer and add it.
Okay sir. Thank you. It was so helpful. Thanks a lot.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.