I have a standard form on an HTML page with the usual input types: text, select, submit. Using Python (the Pyramid framework) to process these forms has been straightforward and without issue.
In this particular form, though, I have needed to use a textarea to accept longer, multi-line input. When processing the user input in Python, I've used the following code:
try:
some_input = request.params['form_element'].decode('utf-8')
except:
some_input = None
This works for text input, but does not for textarea input. textarea input is not processed when a unicode character is included, and throws the following error:
(<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u'some text then a unicode character \u2013 and some more text', 14, 15, 'ordinal not in range(128)'), <traceback object at 0x10265ca70>)
Is there any reason for this? It looks like it's assuming that the textarea input is being treated as ASCII instead of UTF-8, but I'm not sure how to change this.
More information: the page from which the form is being submitted is an HTML5 page with the charset set to UTF-8.
EDIT: Wladimir Palant suggested that it's already been decoded and I check this:
print isinstance(request.params['form_element'], str) returns False
print isinstance(request.params['form_element'], unicode) returns True
.decode("utf-8")on Unicode data. Want to checkisinstance(request.params['form_element'], str)andisinstance(request.params['form_element'], unicode)?decodewhen I needed to useencode. Additionally, I tested my other form inputs and found that they too failed when using utf-8 characters instead of just ASCII characters (I thought I'd tested it earlier and it worked, but apparently I hadn't).