5

How can I get the mail address from HTML code with Nokogiri? I'm thinking in regex but I don't know if it's the best solution.

Example code:

<html>
<title>Example</title>
<body>
This is an example text.
<a href="mailto:[email protected]">Mail to me</a>
</body>
</html>

Does a method exist in Nokogiri to get the mail address if it is not between some tags?

2
  • To use nokogiri you would want to know the class/id of the e-mail field. Commented Feb 29, 2012 at 1:14
  • 3
    You need to show a sample of your HTML, plus code you've tried. Without the HTML any suggestion we make is pretty worthless. And the code lets us know what you've tried and helps us fit the answers back into your code. Commented Feb 29, 2012 at 1:51

4 Answers 4

12

You can extract the email addresses using xpath.

The selector //a will select any a tags on the page, and you can specify the href attribute using @ syntax, so //a/@href will give you the hrefs of all a tags on the page.

If there are a mix of possible a tags on the page with different urls types (e.g. http:// urls) you can use xpath functions to further narrow down the selected nodes. The selector

//a[starts-with(@href, \"mailto:\")]/@href

will give you the href nodes of all a tags that have a href attribute that starts with "mailto:".

Putting this all together, and adding a little extra code to strip out the "mailto:" from the start of the attribute value:

require 'nokogiri'

selector = "//a[starts-with(@href, \"mailto:\")]/@href"

doc = Nokogiri::HTML.parse File.read 'my_file.html'

nodes = doc.xpath selector

addresses = nodes.collect {|n| n.value[7..-1]}

puts addresses

With a test file that looks like this:

<html>
<title>Example</title>
<body>
This is an example text.
<a href="mailto:[email protected]">Mail to me</a>
<a href="http://example.com">A Web link</a>
<a>An empty anchor.</a>
</body>
</html>

this code outputs the desired [email protected]. addresses is an array of all the email addresses in mailto links in the document.

Sign up to request clarification or add additional context in comments.

1 Comment

You just made me paper 💰
0

I'll preface this by saying that I know nothing about Nokogiri. But I just went to their website and looked at the documentation and it looks pretty cool.

If you add an email_field class (or whatever you want to call it) to your email link, you can modify their example code to do what you are looking for.

require 'nokogiri'
require 'open-uri'

# Get a Nokogiri::HTML:Document for the page we’re interested in...

doc = Nokogiri::HTML(open('http://www.yoursite.com/your_page.html'))

# Do funky things with it using Nokogiri::XML::Node methods...

####
# Search for nodes by css
doc.css('.email_field').each do |email|
#  assuming you have than one, do something with all your email fields here
end

If I were you, I would just look at their documentation and experiment with some of their examples.

Here's the site: http://nokogiri.org/

Comments

0

CSS selectors can now (finally) find text at the beginning of a parameter:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<a href="http:example.com">blah</a>
<a href="mailto:[email protected]">blah</a>
EOT

doc.at('a[href^="mailto:"]')
  .to_html # => "<a href=\"mailto:[email protected]\">blah</a>"

Nokogiri tries to track the jQuery extensions. I used to have a link to a change-notice or message from one of the maintainers talking about it but my mileage has varied.

See "CSS Attribute Selectors" for more information.

Comments

-1

Try to get the whole html page and use regular expressions.

1 Comment

While patterns are powerful, they're seldom robust or flexible enough to understand and handle the myriads of ways that HTML can be misused or corrupted. Code built using them will be fragile and sensitive to changes in the HTML. Patterns capable of handling those situations are rapidly unwieldy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.