2

I need a way:

  1. to lets users to use ONLY <strong> and <p> tags.

  2. to avoid users to use CSS with these tags ( for example this must NOT works: <p style="margin:1000px;"> hello </p> ).

  3. to avoid XSS.

    • htmlspecialchars is not sufficient because it convert all tags in html entities.

    • strip_tag is not sufficient because it allow CSS in the tags.

So what PHP functions can I use to do this ?

I don't want to use an external library like html purifier.

4
  • how does your user input the HTML ? by a WYSIWYG editor like TinyMCE ? Commented May 7, 2013 at 11:33
  • yes, the user can use a wysiwyg editor or insert tags manually. Commented May 7, 2013 at 11:35
  • For TinyMCE, it can setup a list of allowed HTML tags: tinymce.com/wiki.php/Configuration:valid_elements can filter out junks. But for manual input, can't help. Commented May 7, 2013 at 11:38
  • 1
    @shivan: still need to do server-side cleanups, otherwise someone will always try to slip in a tag through the back door. Commented May 7, 2013 at 18:45

3 Answers 3

2

The best idea I can think (within the boundaries you require) is to use a custom string of text for <p> and <strong> and then str_replace it with the HTML tags on output. This way they can't inject anything dodgy.

You see this on a lot of forum websites when writing a post, where the user's can click paragraph and bold icons and it will put [p][/p] instead of <p></p>. Then on output str_replace [p] with <p> and [/p] with </p>. If they put any custom CSS or scripts in, then the string_replace would fail and not output any HTML that the browser would render.

Sign up to request clarification or add additional context in comments.

1 Comment

I agree with that..another option is to use markdown syntax like stackoverflow: http://daringfireball.net/projects/markdown/syntax
2

You could write your own little lexer and parser for this very limited subset of HTML:

$input = '…';
$tokens = preg_split('~(</?(?:p|strong)\s*>)~', $input, -1, PREG_SPLIT_DELIM_CAPTURE);
var_dump($tokens);

foreach ($tokens as $i => &$token) {
    if ($i % 2 === 0) {
        // text
        $token = htmlspecialchars($token);
    } else {
        // tag
    }
}
$output = implode('', $tokens);

1 Comment

very smart implementation. but if the input HTML is huge, it will consume a lot of RAM ( huge array ).
0

The Web adopted solutions like MarkDown Language, exactly for these purposes.

Maybe you should implement a Markdown Editor on the client side and a Markdown decoder on the server side. It will permit your users to format their texts but block them, at the XSS / CSS point of view.

http://daringfireball.net/projects/markdown/

K.

1 Comment

OP does not want external libraries.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.