Grabbing the text from HTML source code of URL using Ruby

Question

I've read a couple of articles and posts on stackoverflow surrounding this topic. I apologize if I am repeating someone else's post on stack. Is there a way to iterate through the HTML source code of a given URL and return the text of a header tag?

Example:

<h2 class='title'>
<a href="/blog/step-by-step-guide-to-building-your-first-ruby-gem">Step-by-Step Guide to Building Your First Ruby Gem</a>
</h2>

The code looks for the

tag and returns Step-by-Step Guide to Building Your First Ruby Gem. I know there's the Nokogiri gem that searches for nodes in a xpath:

doc.xpath('//h3/a').each do |link|
puts link.content
end

Is there one where I could potentially do

doc.html('h1').each do |tag| puts link.content end

I hope it makes sense...any insight of direction to a resource will be much appreciated.

Amadan · Accepted Answer · 2014-06-05 02:05:14Z

1

Nokogiri has both XPath and CSS accessors, so you can do

doc.css('h1 > a').each do |tag| puts link.content end

if you don't like XPath. (Or just 'h1' - I am not 100% sure if you want the text of links in headers, or headers themselves).

answered Jun 5, 2014 at 2:05

Amadan

200k23 gold badges253 silver badges321 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ShaunK Over a year ago

Nice! thanks for responding this fast Amadan. I guess I would want the text of the headers themselves.

Collectives™ on Stack Overflow

Grabbing the text from HTML source code of URL using Ruby

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related