For the purpose of translating a website, i need to find out text that are between html tags.
My first approach was to use regex, but it's not enough flexible. The closest that i was able to get with regex was: http://regex101.com/r/qB6xU5/1
but it only fail in the last test, matching p tags in one match instead of two
I consider using dom parser library but wasn't able (in very little search) to find one that can fulfill my needs.
Not to mention that the html may be with error and smarty templating tags.
Here is some example cases and results that should pass:
<div>test</div>=>test<div><br />test</div>=><br />test<div>te<br />st</div>=>te<br />st<div>test<br /></div>=>test<br /><div><span>my</span>test</div>=><span>my</span>test<div>test<span>my</span></div>=>test<span>my</span><div>test<span>my</span>test</div>=>test<span>my</span>test<div><span>my</span>test<span>my</span></div>=><span>my</span>test<span>my</span>
In small word it can be rephrased as it: Find the content of an html tags containing at least one string that is not enclosed in some tags.