6

I'm filtering through a string (Pulled from a text file), and removing all and tags using preg_replace. For some reason, it is removing the actual text "script", but leaving the <> and . I've tried subbing in /< (to try and treat it as a literal), but that just generates errors. How do I get it to remove the brackets as well? The input is <script>Text</script> Here's the code:

$file = file_get_contents($directory . "original-" . $name);
$file = htmlentities($file);
$file = preg_replace('<script>', '', $file);
$file = preg_replace('<\script>', '', $file);

And here is the output:

  <>TEXT</>
4
  • 2
    You are missing delimeters and escapes and htmlentities is changeing your string it might not contain what you expect Commented Jan 31, 2015 at 19:56
  • Could you show me how it should look? Really new to regular expressions. Commented Jan 31, 2015 at 19:58
  • @mattegener Just place the htmlentities line after the replacement of script tags. Also its forward slash / not \ backslash Commented Jan 31, 2015 at 20:01
  • @RahilWazir I tried that, but same result as my previous attempts. Commented Jan 31, 2015 at 20:04

3 Answers 3

8

The answer is

$html = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $html);

But you might want to have a look at the strip_tags function

Sign up to request clarification or add additional context in comments.

3 Comments

Although this is the reason why, you simply state The answer it <this>. You don't explain why it is the answer.
because 1. this has been explained a million times on this website and this question is coming up every other day. 2. this should most likely not be solved with regexp, and strip_tags should be used instead.
A single line saying PHP suports multi delimiters and the chars < > are a ser of delimiters supported. would be enough. You won't die and you will have a better answer.
4

The pattern you use in your preg_* functions has to have some kind of a delimiter before and after that. PHP allows many different delimiters, so it's treating your angle brackets as the regexp delimiter, and not part of the pattern. I ordinarily use { and } as delimiters, many other people use slashes, hash signs, square brackets, parentheses. Angle brackets are also permitted as delimiters, that's why your pattern fails.

You can solve this by adding some delimiters around your patterns, e.g.:

$file = preg_replace('/<script>/', '', $file);

Also, note that PHP regular expressions are case sensitive, so your pattern is foiled by a tag that says <SCRIPT> or <Script>. The i modifier after the pattern (after the closing delimiter) makes it case insensitive (/<script>/i). Also, there are many different ways to write HTML tags that are still interpreted by the browser, e.g.:

<script type="text/javascript">...</script>
<script src="..." />

On a sidenote, and maybe I'm reading too much into your question, you should not, I repeat, not use regexps to parse HTML, and especially to sanitize it.

1 Comment

Without code examples, your answer is quite incomplete. But I agree with the last line. Completely!
0

$html = preg_replace('#(.*?)#is', '', $html);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.