2

I have following regex:

/\{\s?joomla-tag\s+(.*<+.+>+.*)\s?\}/is

and the following code:

$regex = "/\{\s?joomla-tag\s+(.*<+.+>+.*)\s?\}/is";
$replace = '<div class="someclass">$1</div>';
$text = preg_replace( $regex, $replace, $text );

But, unfortunately, it cannot match the following code (nevertheless it should):

.... many html lines .......
<p>123{joomla-tag Lore<strong>m</strong> ip</p>
<p>sum dolor sit amet}</p>
.... many html lines .......

See the real sample: http://pastebin.com/WSQyrmxd

What's wrong: regular expression or something else? Could you please advise the correct variant? In RegExr, everything works smoothly, but not in PHP.

On a local server, i simply get NULL after preg_replace


EDIT: Finally I found a solution: (thanks, sg3s, for an idea) http://www.pelagodesign.com/blog/2008/01/25/wtf-preg_replace-returns-null/

12
  • 1
    Must be the regular expression. Commented Mar 17, 2012 at 20:15
  • I don't think you need to use \{ Commented Mar 17, 2012 at 20:17
  • they are necessary, i need to match {joomla-tag ..... } Commented Mar 17, 2012 at 20:18
  • I just did a copy and paste and it worked fine for me. Commented Mar 17, 2012 at 20:22
  • 1
    I might note that this kind of regex on this large a string is very resource demanding; my test app for this regex literally said it wasn't very happy with it. Commented Mar 17, 2012 at 21:03

3 Answers 3

5

You say you solved the problem, but if your solution was to increase the backtrack_limit setting, that's not a solution. In fact, you're probably setting yourself up for bigger problems later on. You need to find out why it's doing so much backtracking.

After \{\s?joomla-tag\s+ locates the beginning of the tag, the first .* initially gobbles up the remainder of the document. Then it starts backing off, trying to let the rest of the regex match. When it reaches a point where <+ can match, the .+ again consumes the rest of the document, and another wave of backtracking begins. And with yet another .* after that, you're making it do a ridiculous amount of unnecessary work.

This is the reason for the rule of thumb,

Don't use the dot metacharacter (especially .* or .+) if you can use something more specific. If you do use the dot, don't use it in single-line or DOTALL mode (i.e., the /s modifier or its inline, (?s) form).

In this case, you know the match should end at the next closing brace (}), so don't let it match any braces before that:

\{\s?joomla-tag\s+([^}]*)\}
Sign up to request clarification or add additional context in comments.

3 Comments

Yes! The original expression has much room for improvement, backtracking-wise. +1
+1. One question though: Does using the non-greedy .*? help there, given that it wouldn’t eat up the rest of the document?
If the closing } is present, .*? will definitely be more efficient than .*. If it's missing, .*? will keep going until the end of the document before it gives up.
4

Sounds like this may be a: pcre.recursion_limit error due to the PCRE regex engine running out of stack. I've seen this before (but typically the symptoms are more severe - i.e. completely crashing the webserver!) Note that this class of problem will frequently manifest symptoms on a local server and not a remote server, particularly if the local system is running Apache under Windows (The Win32 build of httpd.exe has only 256KB of stack space).

preg_replace() returns NULL when it encounters an error in the PCRE library. You can use the preg_last_error() function to get the last error and print out a message like so:

   $pcre_err = preg_last_error();  // PHP 5.2 and above.
    if ($pcre_err === PREG_NO_ERROR) {
        $msg = 'Successful non-match.';
    } else {
        // preg_match error!
        switch ($pcre_err) {
            case PREG_INTERNAL_ERROR:
                $msg = 'PREG_INTERNAL_ERROR';
                break;
            case PREG_BACKTRACK_LIMIT_ERROR:
                $msg = 'PREG_BACKTRACK_LIMIT_ERROR';
                break;
            case PREG_RECURSION_LIMIT_ERROR:
                $msg = 'PREG_RECURSION_LIMIT_ERROR';
                break;
            case PREG_BAD_UTF8_ERROR:
                $msg = 'PREG_BAD_UTF8_ERROR';
                break;
            case PREG_BAD_UTF8_OFFSET_ERROR:
                $msg = 'PREG_BAD_UTF8_OFFSET_ERROR';
                break;
            default:
                $msg = 'Unrecognized PREG error';
                break;
        }
    }
    echo($msg);

I've explained this error in detail with answers to related questions. See:

RegExp in preg_match function returning browser error

PHP regex: is there anything wrong with this code?

Minifying final HTML output using regular expressions with CodeIgniter

Good luck!

Comments

2

It works for me.

Note that from an HTML standpoint, your replacement does not create a valid structure.

Using the full text

It still works for me, even with the provided full HTML example. So there has to be somethign wrong with your other code; you might want to enable full error output to see if there’s some other issue.

4 Comments

yes, I know. But that will be handled further. Now I need to get this working
Strange really... Can it be caused by encoding?
On a local server, i simply get NULL after preg_replace
The demo text you provided works as well with this regex. codepad.org/EXskRfl1 search for "Cool and nice" -> the problem lies somewhere else.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.