How to encode a unicode string (ones from JSON) to 'utf-8' in python?

Question

I am creating a REST API using Flask-Python. One of the urls (/uploads) takes in (a POST HTTP request) and a JSON '{"src":"void", "settings":"my settings"}'. I can individually extract each object and encode to a byte string which can then be hashed using hashlib in python. However, my goal is to take the whole string and then encode so it looks like...myfile.encode('utf-8'). Printing myfile displays as follows >> {u'src':u'void', u'settings':u'my settings'}, is there anyway I can take the above unicoded string then encode to utf-8 to a sequence of bytes for hashlib.sha1(mayflies.encode('uff-8'). Do let me know for more clarification. Thanks in advance.

fileSRC = request.json['src']
fileSettings = request.json['settings']

myfile = request.json
print myfile

#hash the filename using sha1 from hashlib library
guid_object = hashlib.sha1(fileSRC.encode('utf-8')) // this works however I want myfile to be encoded not fileSRC
guid = guid_object.hexdigest() //this works 
print guid

Clarification: are you trying to make the json a string and hash that? — bwarren2
– bwarren2, Commented Jul 27, 2015 at 16:45
Hi, thanks for your response. I got the answer from your question, it works now. Thank you so much. — divspec
– divspec, Commented Jul 27, 2015 at 16:58
I used ...jsonContent = json.dumps(request.json)..then guid_object = hashlib.sha1(jsonContent.encode('utf-8')). This works now. — divspec
– divspec, Commented Jul 27, 2015 at 21:13

Quentin Pradet · Accepted Answer · 2015-08-21 06:08:26Z

As you said in comments, you solved your issue using:

jsonContent = json.dumps(request.json)
guid_object = hashlib.sha1(jsonContent.encode('utf-8'))

But it's important to understand why this works. Flask sends you unicode() for non-ASCII, and str() for ASCII. Dumping the result using JSON will give you consistent results since it abstracts away the internal Python representation, just as if you only had unicode().

Python 2

In Python 2 (the Python version you're using), you don't need .encode('utf-8') because the default value of ensure_ascii of json.dumps() is True. When you send non-ASCII data to json.dumps(), it will use JSON escape sequences to actually dump ASCII: no need to encode to UTF-8. Also, since the Zen of Python says that "Explicit is better than implicit", even if ensure_ascii is already True, you could specify it:

jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent)

Python 3

In Python 3 however, this would no longer work. Inded, json.dumps() returns unicode in Python 3, even if everything in the unicode string is ASCII. But hashlib.sha1 only works on bytes. You need to make the conversion explicit, even if the ASCII encoding is all you need:

jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent.encode('ascii'))

This is why Python 3 is a better language: it forces you to be more explicit about the text you use, whether it is str (Unicode) or bytes. This avoids many, many problems down the road.

Collectives™ on Stack Overflow

How to encode a unicode string (ones from JSON) to 'utf-8' in python?

1 Answer 1

Python 2

Python 3

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Python 2

Python 3

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related