65

I need to detect whether a string contains HTML tags.

if(!preg_match('(?<=<)\w+(?=[^<]*?>)', $string)){ 
    return $string;
}

The above regex gives me an error:

preg_match() [function.preg-match]: Unknown modifier '\'

I'm not well up on regex so not sure what the problem was. I tried escaping the \ and it didn't do anything.

Is there a better solution than regex? If not, what would be the correct regex to work with the preg_match?

1
  • 3
    add / to the beginning and end of your regex string Commented Apr 20, 2011 at 15:25

8 Answers 8

230

A simple solution is:

if($string != strip_tags($string)) {
    // contains HTML
}

The benefit of this over a regex is it's easier to understand, however I could not comment on the speed of execution of either solution.

Sign up to request clarification or add additional context in comments.

9 Comments

+1 That's the easiest way to detect the presence of tags. You don´t even need strlen.
Nice answer! Much simpler, although I think generally regexpressions are extremely fast.
This will also tell <asdfasdfasdf> a html tag.
The above code will return a false positive if the string contains any control characters like /n /r...
@R1CHY_RICH: Can you provide a sample case for the false positive you describe? The following emits "no html" for me: $s = "hello\r\nworld"; if (strip_tags($s) != $s) { echo 'contains html'; } else { echo 'no html'; }
|
12

you need to 'delimit' the regex with some character or another. Try this:

if(!preg_match('#(?<=<)\w+(?=[^<]*?>)#', $string)){ 
    return $string;
}

2 Comments

Test with this sentence: 'the weight of a cherry < apple but > raspberry. It is <2g.'
(?<=<)\/?[a-zA-z]+\s*(?=[^<]*?>) works better for me
6

If you just want to detect/replace certain tags: This function will search for certain html tags and encapsulate them in brackets - which is pretty senseless - just modify it to whatever you want to do with the tags.

$html = preg_replace_callback(
    '|\</?([a-zA-Z]+[1-6]?)(\s[^>]*)?(\s?/)?\>|',
    function ($found) {
        if(isset($found[1]) && in_array(
            $found[1], 
            array('div','p','span','b','a','strong','center','br','h1','h2','h3','h4','h5','h6','hr'))
        ) {
            return '[' . $found[0] . ']';
        };
    },
    $html  
);

Explaination of the regex:

\< ... \>   //start and ends with tag brackets
\</?        //can start with a slash for closing tags
([a-zA-Z]+[1-6]?)    //the tag itself (for example "h1")
(\s[^>]*)? //anything such as class=... style=... etc.
(\s?/)?     //allow self-closing tags such as <br />

Comments

4

If purpose is just to check if string contain html tag or not. No matter html tags are valid or not. Then you can try this.

function is_html($string) {
  // Check if string contains any html tags.
  return preg_match('/<\s?[^\>]*\/?\s?>/i', $string);
}

This works for all valid or invalid html tags. You can check confirm here https://regex101.com/r/2g7Fx4/3

2 Comments

Why the case-sensitive pattern modifier? Why is > escaped in the negated character class?
Well, the following text is then considered HTML: The case where some number <2 and another >8 is considered to contain HTML
3

I would recommend you to allow defined tags only! You don't want the user to type the <script> tag, which could cause a XSS vulnerability.

Try it with:

$string = '<strong>hello</strong>';
$pattern = "/<(p|span|b|strong|i|u) ?.*>(.*)<\/(p|span|b|strong|i|u)>/"; // Allowed tags are: <p>, <span>, <b>, <strong>, <i> and <u>
preg_match($pattern, $string, $matches);

if (!empty($matches)) {
    echo 'Good, you have used a HTML tag.';
}
else {
    echo 'You didn\'t use a HTML tag or it is not allowed.';
}

1 Comment

Shouldn't you be using a backreference \1 to ensure balanced tags?
2

I would use strlen() because if you don't, then a character-by-character comparison is done and that can be slow, though I would expect the comparison to quit as soon as it found a difference.

Comments

0

If your not good at regular expressions (like me) I find lots of regex libraries out there that usually help me accomplish my task.

Here is a little tutorial that will explain what your trying to do in php.

Here is one of those libraries I was referring to.

Comments

0

Parsing HTML in general is a hard problem, there is some good material here:

But regarding your question ('better' solution) - can be more specific regarding what you are trying to achieve, and what tools are available to you?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.