I am currently trying to write a regular expression in PHP that allows me to match a specific pattern containing itself indefinetely nested. I know that per default regular expressions are not capable of doing that, but PHP's Recursive Patterns (http://php.net/manual/de/regexp.reference.recursive.php) should make it possible.
I have nested structures like this:
<a=5>
<a=3>
Foo
<b>Bar</b>
</a>
Baz
</a>
Now I want to match the content of the outmost tag. In order to correctly match up the first opening tag with the last closing tag, I need PHP's recursion item (?R).
I tried a pattern like this:
/<a=5>((?R)|[^<]|<\/?[^a]|<\/?a[a-zA-Z0-9-])*<\/a>/s
Which basically means <a=5>, followed by as many as possible of the following, followed by </a>:
- another tag (recursively)
- any not-opening-tag character
- any opening tag, followed by an optional slash, not followed by an "a"
- the before WITH an a, but not finished (followed by at least 1 more character)
The last 2 cases could be just one case [tag not namend "a"], but I heard this should be avoided in regular expressions, because it needs lookarounds and would have bad performance.
However, I see no mistake in my RegEx, but it does not match the given string. I want the following match:
<a=3>
Foo
<b>Bar</b>
</a>
Baz
Here's a link to play around with the RegEx: https://www.regex101.com/r/lO1wA6/1
<marker, then it might indeed suffice. One note about the(?R): it doesn't recurse to the first group, but the whole pattern. UseR1. But still, try/xfor readability and inline comments, and also give a more basic example where matching succeeded.a=5, but the inner ones to matcha.*?.<a=5>. (Neither would a proper SGML toolkit make sense.)