I'm trying to parse string in the following format (EBNF, I hope this is right) in PHP:
<exp> ::= <base>[{<modifier>["!"]"("<exp>")"}]
<base> ::= <role>[{<modifier><role>}]
<modifier> ::= "&" | "|"
<role> ::= ["!"]<str>[","<str>]
Where <str> is any string that would pass [a-zA-Z0-9\-]+
The following are example of patterns that would have to be parsed:
token1
token1&token2
token1|(token2&!token3)
(token1&token2)|(token3&(token4|(!token5,12&token6)))
!(token1&token2|(token3&!token4))|token5,12
I am trying to write a RegEx pattern that would always give me four groups:
- The left-most
<expression>. From the above example this would be:token1token1token1token1&token2token1&token2|(token3&!token4)
- If
["!"]was present. I.e.nullnullnullnull!
- The
<modifier>for the next<expression>(if any). This would be:null&|||
- The remaining of the pattern.
nulltoken2token2&!token3token3&(token4|(!token5,12&token6))token5,12
I can parse this provided that the first expression doesn't contain any <modifier>s.
^\(?(!?)([a-zA-Z0-9\-]+)\)?([&|]?)(.*)$
I am stuck at this point. I have tried using lookarounds, however I can't figure out how to ensure that the group is captured when all brackets are balanced. Is this achievable with RegEx or do I need to write code using loops etc. to do this?
?(DEFINE)block. However, PCRE only allows matching, not parsing. You won't get the split up result list as you'd want. (Unless you also use a recursivepreg_replace_callbackto collect all tokens.)