Get HTML structure using Nokogiri

Question

My task is to get the HTML structure of the document without data. From:

<html>
  <head>
    <title>Hello!</title>
  </head>
  <body id="uniq">
    <h1>Hello World!</h1>
  </body>
</html>

I want to get:

<html>
  <head>
    <title></title>
  </head>
  <body id="uniq">
    <h1></h1>
  </body>
</html>

There are a number of ways to extract data with Nokogiri, but I couldn't find a way perform the reverse task.

UPDATE: The solution found is the combination of two answers I received:

doc = Nokogiri::HTML(open("test.html"))
  doc.at_css("html").traverse do |node|
    if node.text?
      node.remove
    end
  end
    puts doc

The output is exactly the one I want.

possible duplicate of How do I create an outline of the HTML tag structure on the page using Nokogiri? — Phrogz
– Phrogz, Commented Nov 21, 2011 at 14:26

pguardiario · Accepted Answer · 2011-11-21 06:27:05Z

4

It sounds like you want to remove all the text nodes. You can do this like so:

doc.xpath('//text()').remove
puts doc

edited Nov 21, 2011 at 6:27

answered Nov 21, 2011 at 4:41

pguardiario

55.2k21 gold badges130 silver badges169 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Yulia Over a year ago

doc = Nokogiri::HTML(open("trial.html")) puts doc.xpath('//text()').remove gives the following result : Hello! Hello world! It is the opposite of what I want..

Larry K · Accepted Answer · 2011-11-21 03:52:09Z

1

Traverse the document. For each node, delete what you don't want. Then write out the document.

Remember that Nokogiri can change the document. Doc

answered Nov 21, 2011 at 3:52

Larry K

49.3k15 gold badges92 silver badges148 bronze badges

2 Comments

Yulia Over a year ago

Thanks, Larry. I read the page from url. You would suggest to write the whole page source to the file and manipulate from there?

Larry K Over a year ago

You mean for loading the doc at the start? You can load direct from an URL into nokogiri. See doc

Collectives™ on Stack Overflow

Get HTML structure using Nokogiri

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related