3

How can I remove certain HTML tags by name in Ruby?

For example:

string = "<!DOCTYPE html><html><body><h1>My First Heading</h1><p>My first paragraph.</p></body></html>"

string.magic_method("h1") #=> "<!DOCTYPE html><html><body><p>My first paragraph.</p></body></html>"

I wrote some regex to do this but wondered if there was a library or native method that could do the same thing.

2 Answers 2

5

Using Nokogiri:

require 'nokogiri'

doc = Nokogiri::HTML <<-_HTML_
<!DOCTYPE html><html><body><h1>My First Heading</h1><p>My first paragraph.</p></body></html>
_HTML_

doc.at('h1')
# => #(Element:0x4d2f006 {
#      name = "h1",
#      children = [ #(Text "My First Heading")]
#      })

doc.at('h1').unlink
puts doc.to_html
# >> <!DOCTYPE html>
# >> <html><body><p>My first paragraph.</p></body></html>
Sign up to request clarification or add additional context in comments.

Comments

2

Use the gem nokogiri. It has some nice methods to manipulate HTML and XML, including one that removes tags as you can see here: How do I remove a node with Nokogiri?

Github: https://github.com/sparklemotion/nokogiri

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.