0

This question is supplementary to: Recursive processing of markup using Regular Expression and DOMDocument

The code supplied by the selected answer has been a great help to understand building a basic syntax tree. However I am now having troubles tightening the regular expressions to only match my syntax rather than {. but not {{. Ideally I would like it to only match my syntax which is:

{<anchor>}
{!image!}
{*strong*}
{/emphasis/}
{|code|}
{-strikethrough-}
{>small<}

Two tags, a and small also require differing end tags. I have tried modifying $re_closetag from the original code sample to reflect this but it still matches too much as text.

For example:

http://www.google.com/>} bang 
smäll<} boom 

My test string is:

tëstïng {{ 汉字/漢字 }} testing {<http://www.google.com/>} bang {>smäll<} boom {* strông{/ ëmphäsïs {- strïkë {| côdë |} -} /} *} {*wôw*} 1, 2, 3

1 Answer 1

1

You can either control this in the RE itself or after a match.

In the re, to control what tags may be "open" modify this part of $re_next:

(?:\{(?P<opentag>[^{\s]))  # match an open tag
      #which is "{" followed by anything other than whitespace or another "{"

Currently it looks for any character which is not { or whitespace. Simply change to this:

(?:\{(?P<opentag>[<!*/|>-]))

Now it looks for only your specific open tags.

The close tag portion only matches a single character at a time depending on what tag is open in the current context. (This is what the $opentag argument is for.) So to match a pair of characters, simply change the $opentag to look for in the recursive call. E.g.:

        if (isset($m['opentag']) && $m['opentag'][1] !== -1) {
            list($newopen, $_) = $m['opentag'];

            // change the close character to look for in the new context
            if ($newopen==='>') $newopen = '<';
            else if ($newopen==='<') $newopen = '>';

            list($subast, $offset) = str_to_ast($s, $offset, array(), $newopen);
            $ast[] = array($newopen, $subast);
        } else if (isset($m['text']) && $m['text'][1] !== -1) {

Alternatively, you can keep the RE as-is and decide what to do with the match after the fact. For example, if you match a @ character but {@ is not an allowed open tag, you can either raise a parse error or simply treat it as a text node (attaching array('#text', '{@') to the ast), or anything in between.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.