2

I want a PHP regex that can find errors on a page. So when I visit a site and crawl the page that I can list the errors on the site.

Currently I have the following code:

preg_match('/<b>.+<\/b>:.+ in <b>\/.+<\/b> on line <b>[0-9]+<\/b><br( \/)?>/msi',$html,$errors);

It can show if errors occurred, but it will not list them! I get the full html page in the array ($errors[0])

Could anybody help?

EDIT: So I have a page with for example the following HTML-source, from which I want to extract the PHP errors:

<b>Warning</b>:  session_start() [<a href='function.session-start'>function.session-start</a>]: The session id contains invalid characters, valid characters are only a-z, A-Z and 0-9 in <b>/home/.../public_html/articlescript/init.php</b> on line <b>127</b><br />
<br />
<b>Warning</b>:  session_start() [<a href='function.session-start'>function.session-start</a>]: Cannot send session cache limiter - headers already sent (output started at /home/.../public_html/articlescript/init.php:127) in <b>/home/.../public_html/articlescript/init.php</b> on line <b>127</b><br />
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
    <title>...
5
  • Could you please provide more information about the entire scenario? Commented Oct 8, 2010 at 16:23
  • I'm not sure what you plan on using this for, but you should be aware that PHP can be (and often is) configured to display errors (when they are even displayed) in different ways. You can't rely on client-side methods to detect server-side errors. Commented Oct 8, 2010 at 16:35
  • In most cases they are displayed this way, and I'm aware that they can be turned off. I just want to check a page CLIENT-SIDE if there are errors like these. Nowhere I could find a regex that works for this case! Commented Oct 8, 2010 at 16:38
  • Why is normal error handling not an option? php.net/manual/en/intro.errorfunc.php Commented Oct 8, 2010 at 16:54
  • Because it's an external site. Commented Oct 8, 2010 at 16:59

5 Answers 5

5

Since – well, you know – you shouldn’t use regular expressions to parse HTML, try this using PHP’s DOM library:

libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($str);
$messages = array();
foreach ($doc->getElementsByTagName('b') as $elem) {
    if (in_array($elem->textContent, array('Error', 'Warning', 'Notice'))) {
        $buffer = $elem->textContent;
        while ($elem->nextSibling !== null && strtolower($elem->nextSibling->localName) !== 'br') {
            $elem = $elem->nextSibling;
            $buffer .= $elem->textContent;
        }
        $messages[] = $buffer;
    }
}

This will search for B elements that’s content is one of “Error”, “Warning”, or “Notice” and take the textual contents from there up to the next BR element. The initial call of libxml_use_internal_errors will prevent that parsing errors will be reported.

Sign up to request clarification or add additional context in comments.

3 Comments

This works not entirely, how can I let this work the same as ideone.com/utL3K?
@Kevin: Ok, I have to admit that this might fail if the document is actually invalid HTML and is fragmented in such a way that parsing fails.
No, it just does not list the errors correct. The while doesn't work. If I just delete the while it will list the errors... But not the texts
2

Remember to escape your \ in strings.

preg_match_all('#<b>(.+?)</b>:(.+?) in <b>(.+?)</b> on line <b>([0-9]+)</b><br(?: /)?>#is',$string,$errors);

This code on ideone

Comments

2

Forgive my language but it's quite foolish to attempt to parse HTML with regular expressions, especially potentially-malformed HTML. Use an HTML parsing library instead.

For HTML parsing and validation in HTML, I would refer to this answer; also check out the tidy extension.

4 Comments

Well, in this case the HTML isn't really XML compliant, and moreover you can't really know where this error will show up so an XML parser (or HTML for what it worth) won't help.
@Colin: there are HTML parsers that will identify errors, which is precisely what the OP wants to do. HTML is not regular, so using a regular expression will not be fruitful.
That comment must be one of the most-linked ones here.
@Kevin: I've edited my answer with the best links I could find.
0

Put brackets () around the bits of regex that you want to be stored in $errors.
You'll also want to use preg_match_all() rather then preg_match().

Comments

0

If this is your own website you can either: set the log levels and parse your log files (easier) or run your scripts from the command line with php -l.

1 Comment

The problem is, it is the site of a client, so I can't use that method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.