Python does not properly process text input from an HTML textarea

Question

I have a standard form on an HTML page with the usual input types: text, select, submit. Using Python (the Pyramid framework) to process these forms has been straightforward and without issue.

In this particular form, though, I have needed to use a textarea to accept longer, multi-line input. When processing the user input in Python, I've used the following code:

try:
    some_input = request.params['form_element'].decode('utf-8')
except:
    some_input = None

This works for text input, but does not for textarea input. textarea input is not processed when a unicode character is included, and throws the following error:

(<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u'some text then a unicode character \u2013 and some more text', 14, 15, 'ordinal not in range(128)'), <traceback object at 0x10265ca70>)

Is there any reason for this? It looks like it's assuming that the textarea input is being treated as ASCII instead of UTF-8, but I'm not sure how to change this.

More information: the page from which the form is being submitted is an HTML5 page with the charset set to UTF-8.

EDIT: Wladimir Palant suggested that it's already been decoded and I check this:

print isinstance(request.params['form_element'], str) returns False

print isinstance(request.params['form_element'], unicode) returns True

Sounds like your parameter has been decoded already and you are trying to use .decode("utf-8") on Unicode data. Want to check isinstance(request.params['form_element'], str) and isinstance(request.params['form_element'], unicode)? — user785541
– user785541, Commented Jul 1, 2011 at 8:55
You're right, Wladimir. If you want to put it as the answer I'll accept it. It turns out that I was using decode when I needed to use encode. Additionally, I tested my other form inputs and found that they too failed when using utf-8 characters instead of just ASCII characters (I thought I'd tested it earlier and it worked, but apparently I hadn't). — johneth
– johneth, Commented Jul 1, 2011 at 9:17

Antoine Leclair · Accepted Answer · 2011-07-01 15:04:11Z

2

There is no difference between a input[type=text] and a textarea when the data is submitted. The problem you describe should happen in both.

Correct me if I'm wrong, but WebOb, which is used in Pyramid, does the decoding for you. You get Unicode already, so there is no need to decode or encode anything. Also, you can use unicode for the response, and it will be encoded automatically. You rarely have to use encode or decode in Pyramid applications.

Collectives™ on Stack Overflow

Python does not properly process text input from an HTML textarea

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related