4

Programming a Python web application, I want to create a text area where the users can enter text in a lightweight markup language. The text will be imported to a html template and viewed on the page. Today I use this command to create the textarea, which allows users to enter any (html) text:

my_text = cgidata.getvalue('my_text', 'default_text')
ftable.AddRow([Label(_('Enter your text')),
               TextArea('my_text', my_text, rows=8, cols=60).Format()])

How can I change this so that only some (safe, eventually lightweight) markup is allowed? All suggestions including sanitizers are welcome, as long as it easily integrates with Python.

3 Answers 3

8

Use the python markdown implementation

import markdown
mode = "remove" # or "replace" or "escape"
md = markdown.Markdown(safe_mode=mode)
html = md.convert(text)

It is very flexible, you can use various extensions, create your own etc.

Sign up to request clarification or add additional context in comments.

3 Comments

I tried it using iPython, defining text as some html including a script tag. I got a strange output: text was still the same and html = '[HTML_REMOVED]' What else do I need to do to get this to remove the dangerous tags? I tried all three modes with the same result.
Running a few tests I realized I'm not allowed to enter any html tags but only markdown syntax and while doing so I get safe output. Thanks, it worked!
from the docs To replace HTML, set safe_mode="replace" (safe_mode=True still works for backward compatibility with older versions). The HTML will be replaced with the text defined in markdown.HTML_REMOVED_TEXT which defaults to [HTML_REMOVED]. To replace the HTML with something else: markdown.HTML_REMOVED_TEXT = "--RAW HTML IS NOT ALLOWED--"
2

You could use restructured text . I'm not sure if it has a sanitizing option, but it's well supported by Python, and it generates all sorts of formats.

Comments

1

This simple sanitizing function uses a whitelist and is roughly the same as the solution of python-html-sanitizer-scrubber-filter, but also allows to limit the use of attributes (since you probably don't want someone to use, among others, the style attribute):

from BeautifulSoup import BeautifulSoup

def sanitize_html(value):
    valid_tags = 'p i b strong a pre br'.split()
    valid_attrs = 'href src'.split()
    soup = BeautifulSoup(value)
    for tag in soup.findAll(True):
        if tag.name not in valid_tags:
            tag.hidden = True
        tag.attrs = [(attr, val) for attr, val in tag.attrs if attr in valid_attrs]
    return soup.renderContents().decode('utf8').replace('javascript:', '')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.