1

I've read a couple of articles and posts on stackoverflow surrounding this topic. I apologize if I am repeating someone else's post on stack. Is there a way to iterate through the HTML source code of a given URL and return the text of a header tag?

Example:

<h2 class='title'>
<a href="/blog/step-by-step-guide-to-building-your-first-ruby-gem">Step-by-Step Guide to Building Your First Ruby Gem</a>
</h2>

The code looks for the

tag and returns Step-by-Step Guide to Building Your First Ruby Gem. I know there's the Nokogiri gem that searches for nodes in a xpath:

doc.xpath('//h3/a').each do |link|
puts link.content
end

Is there one where I could potentially do

doc.html('h1').each do |tag| puts link.content end

I hope it makes sense...any insight of direction to a resource will be much appreciated.

1 Answer 1

1

Nokogiri has both XPath and CSS accessors, so you can do

doc.css('h1 > a').each do |tag| puts link.content end

if you don't like XPath. (Or just 'h1' - I am not 100% sure if you want the text of links in headers, or headers themselves).

Sign up to request clarification or add additional context in comments.

1 Comment

Nice! thanks for responding this fast Amadan. I guess I would want the text of the headers themselves.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.