PHP: Regex replace while ignoring content between html tags

Question

I'm looking for a regular expressions string that can find a word or regex string NOT between html tags.

Say I want to replace (alpha|beta) in: the first two letters in the greek alphabet are alpha and <b>beta</b>

I only want it to replace alpha, because beta is between <> tags. So ignore (<(.*?)>(.*?)<\/(.*?)>)

:)

It's ok :) it's just if you try to use tags it might not work without the code wrapper. — Calum
– Calum, Commented Apr 16, 2011 at 18:40
@sln, I mean on one line. Limited between \r\n at the beginning and end. — Brian
– Brian, Commented Apr 17, 2011 at 0:15

Marcus Pope · Accepted Answer · 2011-04-17 20:35:05Z

I didn't test the logic used in this page - http://www.phpro.org/examples/Get-Text-Between-Tags.html But I can confirm the logical point made at the top of the page in big bold letters that says you shouldn't do what you're trying to do with regex.

Html is not uniform and edge cases will always bite you in the rear if you use regular expressions to handle the content of those tags in any real world situation. So unless your markup is extremely simplistic, uniform, 100% accurate, only contains html (not css, javascript or garbage) then your best bet is a dom parser library.

And really many dom parser libraries have problems too but you'll be miles ahead of the regex counterparts. The best way to get the text contet of tags is to render the html in a browser and access the innerText property of the given dom node (or have a human copy and paste the contents out manually) - but that isn't always an option :D

bobs12 · Accepted Answer · 2012-03-16 10:21:13Z

1

It's maybe the 'wrong' way, but it works: when I need to do something similar, I first do a preg_replace_callback to find what I don't want to match and encode it with something like base64.

Then I can happily run an ordinary preg_replace on the result, knowing that it has no chance of matching the strings I want to ignore. Then unscramble using the same pattern in preg_replace_callback, this time sending the matches to be base64 decoded.

I often do this when automatically adding keyword or glossary links or tooltips to a text - I scramble the HTML tags themselves so that I don't try to create a link or a tooltip within the title of an anchor tag or somewhere equally ridiculous, for example.

answered Mar 16, 2012 at 10:21

bobs12

111 bronze badge

Collectives™ on Stack Overflow

PHP: Regex replace while ignoring content between html tags

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related