search and replace with ruby regex

Question

I have a text blob field in a MySQL column that contains HTML. I have to change some of the markup, so I figured I'll do it in a ruby script. Ruby is irrelevant here, but it would be nice to see an answer with it. The markup looks like the following:

<h5>foo</h5>
  <table>
    <tbody>
    </tbody>
  </table>

<h5>bar</h5>
  <table>
    <tbody>
    </tbody>
  </table>

<h5>meow</h5>
  <table>
    <tbody>
    </tbody>
  </table>

I need to change just the first <h5>foo</h5> block of each text to <h2>something_else</h2> while leaving the rest of the string alone.

Can't seem to get the proper PCRE regex, using Ruby.

I implore you to consider using an HTML parser instead of using regex for html. As it has been said many, many, many times before, Regex parsers are incapable of accurately parsing HTML. — Travis Kaufman
– Travis Kaufman, Commented Apr 18, 2013 at 21:16
Specifically, I recommend using Nokogiri to load your HTML, manipulate it, and then emit the result. — Phrogz
– Phrogz, Commented Sep 26, 2014 at 19:32

Phrogz · Accepted Answer · 2011-01-16 01:59:22Z

# The regex literal syntax using %r{...} allows / in your regex without escaping
new_str = my_str.sub( %r{<h5>[^<]+</h5>}, '<h2>something_else</h2>' )

Using String#sub instead of String#gsub causes only the first replacement to occur. If you need to dynamically choose what 'foo' is, you can use string interpolation in regex literals:

new_str = my_str.sub( %r{<h5>#{searchstr}</h5>}, "<h2>#{replacestr}</h2>" )

Then again, if you know what 'foo' is, you don't need a regex:

new_str = my_str.sub( "<h5>searchstr</h5>", "<h2>#{replacestr}</h2>" )

or even:

my_str[ "<h5>searchstr</h5>" ] = "<h2>#{replacestr}</h2>"

If you need to run code to figure out the replacement, you can use the block form of sub:

new_str = my_str.sub %r{<h5>([^<]+)</h5>} do |full_match|
  # The expression returned from this block will be used as the replacement string
  # $1 will be the matched content between the h5 tags.
  "<h2>#{replacestr}</h2>"
end

the Tin Man · Accepted Answer · 2011-01-16 02:12:08Z

Whenever I have to parse or modify HTML or XML I reach for a parser. I almost never bother with regex or instring unless it's absolutely a no-brainer.

Here's how to do it using Nokogiri, without any regex:

text = <<EOT
<h5>foo</h5>
  <table>
    <tbody>
    </tbody>
  </table>

<h5>bar</h5>
  <table>
    <tbody>
    </tbody>
  </table>

<h5>meow</h5>
  <table>
    <tbody>
    </tbody>
  </table>
EOT

require 'nokogiri'

fragment = Nokogiri::HTML::DocumentFragment.parse(text)
print fragment.to_html

fragment.css('h5').select{ |n| n.text == 'foo' }.each do |n|
  n.name = 'h2'
  n.content = 'something_else'
end

print fragment.to_html

After parsing, this is what Nokogiri has returned from the fragment:

# >> <h5>foo</h5>
# >>   <table><tbody></tbody></table><h5>bar</h5>
# >>   <table><tbody></tbody></table><h5>meow</h5>
# >>   <table><tbody></tbody></table>

This is after running:

# >> <h2>something_else</h2>
# >>   <table><tbody></tbody></table><h5>bar</h5>
# >>   <table><tbody></tbody></table><h5>meow</h5>
# >>   <table><tbody></tbody></table>

Ross Attrill · Accepted Answer · 2014-08-12 01:34:51Z

2

Use String.gsub with the regular expression <h5>[^<]+<\/h5>:

>> current = "<h5>foo</h5>\n  <table>\n    <tbody>\n    </tbody>\n  </table>"
>> updated = current.gsub(/<h5>[^<]+<\/h5>/){"<h2>something_else</h2>"}
=> "<h2>something_else</h2>\n  <table>\n    <tbody>\n    </tbody>\n  </table>"

Note, you can test ruby regular expression comfortably in your browser.

edited Aug 12, 2014 at 1:34

Ross Attrill

2,7321 gold badge28 silver badges32 bronze badges

answered Jan 16, 2011 at 1:54

miku

189k47 gold badges314 silver badges317 bronze badges

Collectives™ on Stack Overflow

search and replace with ruby regex

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related