I am not sure, under what conditiones you select the string to capture, why gets 1. not captured, but your 2. string does? As long, as you do not explain that I can only guess, so as an expression:
/<\w+(?:\s+\w+=(?:(?:"[^"]*")|(?:'[^']*')))*\s*>([^<]+)</\w+>/g
will match all html tags, that only contain a text node (wich should be case for xhtml, since <p>text<br /></p> would not be wellformed...).
so <p>text</p><br>text2</br> will be matched and as a result the text will be in capturegroup 1.
<\w+(?:\s+\w+=(?:(?:"[^"]*")|(?:'[^']*')))*\s*> will capture every opening xhtml tag
([^<]+) will catch all cahrs exept from < and put it in the capturegroup
</\w+> finally catches the closing tag...
the g is the global flag so that the expression can catch multiple results...
Good luck with this, if you need something different please be a little more precise...