Lightweight markup language for Python

Question

Programming a Python web application, I want to create a text area where the users can enter text in a lightweight markup language. The text will be imported to a html template and viewed on the page. Today I use this command to create the textarea, which allows users to enter any (html) text:

my_text = cgidata.getvalue('my_text', 'default_text')
ftable.AddRow([Label(_('Enter your text')),
               TextArea('my_text', my_text, rows=8, cols=60).Format()])

How can I change this so that only some (safe, eventually lightweight) markup is allowed? All suggestions including sanitizers are welcome, as long as it easily integrates with Python.

molicule · Accepted Answer · 2009-08-03 18:49:43Z

8

Use the python markdown implementation

import markdown
mode = "remove" # or "replace" or "escape"
md = markdown.Markdown(safe_mode=mode)
html = md.convert(text)

It is very flexible, you can use various extensions, create your own etc.

answered Aug 3, 2009 at 18:49

molicule

5,5713 gold badges31 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Anna SdG Over a year ago

I tried it using iPython, defining text as some html including a script tag. I got a strange output: text was still the same and html = '[HTML_REMOVED]' What else do I need to do to get this to remove the dangerous tags? I tried all three modes with the same result.

Anna SdG Over a year ago

Running a few tests I realized I'm not allowed to enter any html tags but only markdown syntax and while doing so I get safe output. Thanks, it worked!

molicule Over a year ago

from the docs To replace HTML, set safe_mode="replace" (safe_mode=True still works for backward compatibility with older versions). The HTML will be replaced with the text defined in markdown.HTML_REMOVED_TEXT which defaults to [HTML_REMOVED]. To replace the HTML with something else: markdown.HTML_REMOVED_TEXT = "--RAW HTML IS NOT ALLOWED--"

Christopher · Accepted Answer · 2009-08-03 18:08:18Z

2

You could use restructured text . I'm not sure if it has a sanitizing option, but it's well supported by Python, and it generates all sorts of formats.

answered Aug 3, 2009 at 18:08

Christopher

9,1242 gold badges35 silver badges43 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:19:07Z

1

This simple sanitizing function uses a whitelist and is roughly the same as the solution of python-html-sanitizer-scrubber-filter, but also allows to limit the use of attributes (since you probably don't want someone to use, among others, the style attribute):

from BeautifulSoup import BeautifulSoup

def sanitize_html(value):
    valid_tags = 'p i b strong a pre br'.split()
    valid_attrs = 'href src'.split()
    soup = BeautifulSoup(value)
    for tag in soup.findAll(True):
        if tag.name not in valid_tags:
            tag.hidden = True
        tag.attrs = [(attr, val) for attr, val in tag.attrs if attr in valid_attrs]
    return soup.renderContents().decode('utf8').replace('javascript:', '')

edited May 23, 2017 at 12:19

CommunityBot

11 silver badge

answered Aug 3, 2009 at 20:42

Gerald Senarclens de Grancy

7,3848 gold badges45 silver badges55 bronze badges

Collectives™ on Stack Overflow

Lightweight markup language for Python

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related