2

I have a standard form on an HTML page with the usual input types: text, select, submit. Using Python (the Pyramid framework) to process these forms has been straightforward and without issue.

In this particular form, though, I have needed to use a textarea to accept longer, multi-line input. When processing the user input in Python, I've used the following code:

try:
    some_input = request.params['form_element'].decode('utf-8')
except:
    some_input = None

This works for text input, but does not for textarea input. textarea input is not processed when a unicode character is included, and throws the following error:

(<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u'some text then a unicode character \u2013 and some more text', 14, 15, 'ordinal not in range(128)'), <traceback object at 0x10265ca70>)

Is there any reason for this? It looks like it's assuming that the textarea input is being treated as ASCII instead of UTF-8, but I'm not sure how to change this.

More information: the page from which the form is being submitted is an HTML5 page with the charset set to UTF-8.

EDIT: Wladimir Palant suggested that it's already been decoded and I check this:

print isinstance(request.params['form_element'], str) returns False

print isinstance(request.params['form_element'], unicode) returns True

4
  • Should we know which framework you are using? Commented Jul 1, 2011 at 8:44
  • It's Pyramid. I'll clarify it in the question. Commented Jul 1, 2011 at 8:46
  • 5
    Sounds like your parameter has been decoded already and you are trying to use .decode("utf-8") on Unicode data. Want to check isinstance(request.params['form_element'], str) and isinstance(request.params['form_element'], unicode)? Commented Jul 1, 2011 at 8:55
  • You're right, Wladimir. If you want to put it as the answer I'll accept it. It turns out that I was using decode when I needed to use encode. Additionally, I tested my other form inputs and found that they too failed when using utf-8 characters instead of just ASCII characters (I thought I'd tested it earlier and it worked, but apparently I hadn't). Commented Jul 1, 2011 at 9:17

1 Answer 1

2

There is no difference between a input[type=text] and a textarea when the data is submitted. The problem you describe should happen in both.

Correct me if I'm wrong, but WebOb, which is used in Pyramid, does the decoding for you. You get Unicode already, so there is no need to decode or encode anything. Also, you can use unicode for the response, and it will be encoded automatically. You rarely have to use encode or decode in Pyramid applications.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.