3

I am creating a REST API using Flask-Python. One of the urls (/uploads) takes in (a POST HTTP request) and a JSON '{"src":"void", "settings":"my settings"}'. I can individually extract each object and encode to a byte string which can then be hashed using hashlib in python. However, my goal is to take the whole string and then encode so it looks like...myfile.encode('utf-8'). Printing myfile displays as follows >> {u'src':u'void', u'settings':u'my settings'}, is there anyway I can take the above unicoded string then encode to utf-8 to a sequence of bytes for hashlib.sha1(mayflies.encode('uff-8'). Do let me know for more clarification. Thanks in advance.

fileSRC = request.json['src']
fileSettings = request.json['settings']

myfile = request.json
print myfile

#hash the filename using sha1 from hashlib library
guid_object = hashlib.sha1(fileSRC.encode('utf-8')) // this works however I want myfile to be encoded not fileSRC
guid = guid_object.hexdigest() //this works 
print guid
3
  • 1
    Clarification: are you trying to make the json a string and hash that? Commented Jul 27, 2015 at 16:45
  • Hi, thanks for your response. I got the answer from your question, it works now. Thank you so much. Commented Jul 27, 2015 at 16:58
  • I used ...jsonContent = json.dumps(request.json)..then guid_object = hashlib.sha1(jsonContent.encode('utf-8')). This works now. Commented Jul 27, 2015 at 21:13

1 Answer 1

1

As you said in comments, you solved your issue using:

jsonContent = json.dumps(request.json)
guid_object = hashlib.sha1(jsonContent.encode('utf-8'))

But it's important to understand why this works. Flask sends you unicode() for non-ASCII, and str() for ASCII. Dumping the result using JSON will give you consistent results since it abstracts away the internal Python representation, just as if you only had unicode().

Python 2

In Python 2 (the Python version you're using), you don't need .encode('utf-8') because the default value of ensure_ascii of json.dumps() is True. When you send non-ASCII data to json.dumps(), it will use JSON escape sequences to actually dump ASCII: no need to encode to UTF-8. Also, since the Zen of Python says that "Explicit is better than implicit", even if ensure_ascii is already True, you could specify it:

jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent)

Python 3

In Python 3 however, this would no longer work. Inded, json.dumps() returns unicode in Python 3, even if everything in the unicode string is ASCII. But hashlib.sha1 only works on bytes. You need to make the conversion explicit, even if the ASCII encoding is all you need:

jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent.encode('ascii'))

This is why Python 3 is a better language: it forces you to be more explicit about the text you use, whether it is str (Unicode) or bytes. This avoids many, many problems down the road.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.