Ruby split with regex - regex isn't doing what i want

Question

i have this string

string = "<p>para1</p><p>para2</p><p>para3</p>"

I want to split on the para2 text, so that i get this

["<p>para1</p>", "<p>para3</p>"]

The catch is that sometimes para2 might not be wrapped in p tags (and there might be optional spaces outside the p and inside it). I thought that this would do it:

string.split(/\s*(<p>)?\s*para2\s*(<\/p>)?\s*/)

but, i get this:

["<p>para1</p>", "<p>", "</p>", "<p>para3</p>"]

it's not pulling the start and end p tags into the matching pattern - they should be eliminated as part of the split. Ruby's regular expressions are greedy by default so i thought that they would get pulled in. And, this seems to be confirmed if i do a gsub instead of a split:

string.gsub(/\s*(<p>)?\s*para2\s*(<\/p>)?\s*/, "XXX")
=> "<p>para1</p>XXX<p>para3</p>"

They are being pulled in and got rid of here, but not on the split. Any ideas anyone?

thanks, max

Remember, you can never truly parse HTML with regex. If this string is in any way dependent on outside input, use an HTML parser like hpricot or nokogiri. — Matchu
– Matchu, Commented Jan 29, 2010 at 18:40

Gumbo · Accepted Answer · 2010-01-29 18:39:27Z

8

Replace your capturing groups (…) with non-capturing groups (?:…):

/\s*(?:<p>)?\s*para2\s*(?:<\/p>)?\s*/

answered Jan 29, 2010 at 18:39

Gumbo

657k112 gold badges792 silver badges852 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

mckeed Over a year ago

This answer is correct. When you split by a regex with capturing groups, it puts the captures into the array, so you can do more complex scanning/splitting operations.

btelles Over a year ago

Nifty...didn't know we had that in Ruby!

Max Williams Over a year ago

Thanks Gumbo, that does the trick. I'd never even heard of non-capturing groups before, that's a really useful bit of knowledge.

Collectives™ on Stack Overflow

Ruby split with regex - regex isn't doing what i want

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related