Detect HTML tags in a string

Question

I need to detect whether a string contains HTML tags.

if(!preg_match('(?<=<)\w+(?=[^<]*?>)', $string)){ 
    return $string;
}

The above regex gives me an error:

preg_match() [function.preg-match]: Unknown modifier '\'

I'm not well up on regex so not sure what the problem was. I tried escaping the \ and it didn't do anything.

Is there a better solution than regex? If not, what would be the correct regex to work with the preg_match?

add / to the beginning and end of your regex string

Kevin Peno
– Kevin Peno

2011-04-20 15:25:06 +00:00
Commented Apr 20, 2011 at 15:25 — Kevin Peno
– Kevin Peno, Commented Apr 20, 2011 at 15:25

Real Dreams · Accepted Answer · 2016-02-16 13:10:19Z

230

A simple solution is:

if($string != strip_tags($string)) {
    // contains HTML
}

The benefit of this over a regex is it's easier to understand, however I could not comment on the speed of execution of either solution.

edited Feb 16, 2016 at 13:10

Real Dreams

18.2k25 gold badges104 silver badges182 bronze badges

answered Apr 20, 2011 at 15:27

Diarmaid

2,7642 gold badges17 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

jeroen Over a year ago

+1 That's the easiest way to detect the presence of tags. You don´t even need strlen.

clamchoda Over a year ago

Nice answer! Much simpler, although I think generally regexpressions are extremely fast.

ESP32 Over a year ago

This will also tell <asdfasdfasdf> a html tag.

R1CHY_RICH Over a year ago

The above code will return a false positive if the string contains any control characters like /n /r...

Alex Beynenson Over a year ago

@R1CHY_RICH: Can you provide a sample case for the false positive you describe? The following emits "no html" for me: $s = "hello\r\nworld"; if (strip_tags($s) != $s) { echo 'contains html'; } else { echo 'no html'; }

|

simon · Accepted Answer · 2011-04-20 15:25:33Z

12

you need to 'delimit' the regex with some character or another. Try this:

if(!preg_match('#(?<=<)\w+(?=[^<]*?>)#', $string)){ 
    return $string;
}

answered Apr 20, 2011 at 15:25

simon

16.5k5 gold badges47 silver badges68 bronze badges

2 Comments

ESP32 Over a year ago

Test with this sentence: 'the weight of a cherry < apple but > raspberry. It is <2g.'

ESP32 Over a year ago

(?<=<)\/?[a-zA-z]+\s*(?=[^<]*?>) works better for me

ESP32 · Accepted Answer · 2020-03-07 01:58:02Z

If you just want to detect/replace certain tags: This function will search for certain html tags and encapsulate them in brackets - which is pretty senseless - just modify it to whatever you want to do with the tags.

$html = preg_replace_callback(
    '|\</?([a-zA-Z]+[1-6]?)(\s[^>]*)?(\s?/)?\>|',
    function ($found) {
        if(isset($found[1]) && in_array(
            $found[1], 
            array('div','p','span','b','a','strong','center','br','h1','h2','h3','h4','h5','h6','hr'))
        ) {
            return '[' . $found[0] . ']';
        };
    },
    $html  
);

Explaination of the regex:

\< ... \>   //start and ends with tag brackets
\</?        //can start with a slash for closing tags
([a-zA-Z]+[1-6]?)    //the tag itself (for example "h1")
(\s[^>]*)? //anything such as class=... style=... etc.
(\s?/)?     //allow self-closing tags such as <br />

MutantMahesh · Accepted Answer · 2017-07-25 00:07:48Z

4

If purpose is just to check if string contain html tag or not. No matter html tags are valid or not. Then you can try this.

function is_html($string) {
  // Check if string contains any html tags.
  return preg_match('/<\s?[^\>]*\/?\s?>/i', $string);
}

This works for all valid or invalid html tags. You can check confirm here https://regex101.com/r/2g7Fx4/3

answered Jul 25, 2017 at 0:07

MutantMahesh

1,76619 silver badges21 bronze badges

2 Comments

mickmackusa Over a year ago

Why the case-sensitive pattern modifier? Why is > escaped in the negated character class?

Heiko Vogel Over a year ago

Well, the following text is then considered HTML: The case where some number <2 and another >8 is considered to contain HTML

Reza Saadati · Accepted Answer · 2018-08-13 11:38:14Z

3

I would recommend you to allow defined tags only! You don't want the user to type the <script> tag, which could cause a XSS vulnerability.

Try it with:

$string = '<strong>hello</strong>';
$pattern = "/<(p|span|b|strong|i|u) ?.*>(.*)<\/(p|span|b|strong|i|u)>/"; // Allowed tags are: <p>, <span>, <b>, <strong>, <i> and <u>
preg_match($pattern, $string, $matches);

if (!empty($matches)) {
    echo 'Good, you have used a HTML tag.';
}
else {
    echo 'You didn\'t use a HTML tag or it is not allowed.';
}

answered Aug 13, 2018 at 11:38

Reza Saadati

5,4644 gold badges33 silver badges69 bronze badges

1 Comment

mickmackusa Over a year ago

Shouldn't you be using a backreference \1 to ensure balanced tags?

Alfred · Accepted Answer · 2012-08-18 10:55:12Z

2

I would use strlen() because if you don't, then a character-by-character comparison is done and that can be slow, though I would expect the comparison to quit as soon as it found a difference.

edited Aug 18, 2012 at 10:55

Alfred

21.5k64 gold badges175 silver badges258 bronze badges

answered May 10, 2011 at 22:05

slsdoug

211 bronze badge

Comments

clamchoda · Accepted Answer · 2011-04-20 15:28:24Z

0

If your not good at regular expressions (like me) I find lots of regex libraries out there that usually help me accomplish my task.

Here is a little tutorial that will explain what your trying to do in php.

Here is one of those libraries I was referring to.

answered Apr 20, 2011 at 15:28

clamchoda

5,1213 gold badges45 silver badges83 bronze badges

Comments

ssi-anik · Accepted Answer · 2024-10-01 04:27:26Z

0

Parsing HTML in general is a hard problem, there is some good material here:

But regarding your question ('better' solution) - can be more specific regarding what you are trying to achieve, and what tools are available to you?

edited Oct 1, 2024 at 4:27

ssi-anik

3,8244 gold badges28 silver badges52 bronze badges

answered Apr 20, 2011 at 15:27

Addys

2,50915 silver badges24 bronze badges

Collectives™ on Stack Overflow

Detect HTML tags in a string

8 Answers 8

9 Comments

2 Comments

Comments

2 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

9 Comments

2 Comments

Comments

2 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related