-2

I'm trying to remove script tags from the source code using regular expression.

/<\s*script[^>]*[^\/]>(.*?)<\s*\/\s*script\s*>/is

But I ran into the problem when I need to remove the code inside another code.

Please see this screenshot

I'm tested in https://regex101.com/r/R6XaUT/1

How do I correctly create a regular expression so that it can cover all the code?

1

2 Answers 2

1

Sample text:

$text = '<b>sample</b> text with <div>tags</div>'; 

Result for strip_tags($text):

Output: sample text with tags 

Result for strip_tags_content($text):

Output: text with 

Result for strip_tags_content($text, ''):

Output: <b>sample</b> text with 

Result for strip_tags_content($text, '', TRUE);

Output: text with <div>tags</div> 

I hope that someone is useful :) source link

Sign up to request clarification or add additional context in comments.

Comments

0

Simply use the PHP function strip_tags. See

http://php.net/manual/de/function.strip-tags.php

$string = "<div>hello</div>";
echo strip_tags($string);

Will output

hello

You also can provide a list of tags to keep.

==

Another approach is this:

// Load a file into $html
$html = file_get_contents('scratch.html');
$matches = [];
preg_match_all("/<\/*([^\s>]*)>/", $html, $matches);

// Have a list of all Tags only once
$tags = array_unique($matches[1]);

// Find the script index and remove it
$scriptTagIndex = array_search("script", $tags);
if($scriptTagIndex !== false) unset($tags[$scriptTagIndex]);

// Taglist must be a string containing <tagname1><tagename2>...
$allowedTags = array_map(function ($s) { return "<$s>"; }, $tags);

// Stript the HTML and keep all Tags except for removed ones (script)
$noScript = strip_tags($html,join("", $allowedTags));

echo $noScript;

6 Comments

Thanks, but I need to clear only the script tags, the exclude list will be very large.
Do you also want the contents in between the script tags to be removed?
@AlexKovalev Why only the script tags? If it is security you are concerned about you need to realize that you can run javascript from html tag attributes like onLoad so you gain nothing by removing just the script tags.
@jeroen This isn't for security, this code prevents the DOM parser. class DOMDocument сan't parse the document unless you clear the scripts.
@AlexKovalev Really? Maybe here you can find a better parser: stackoverflow.com/questions/4029341/…. And you might want to add that to your question, perhaps it will attract answers in a different direction.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.