29

I have a use case where i have to generate md5 hash of a JSON object and compare the hashes in the server and the browser.

The browser client generates hash and then asks the server for the hash of the same resource[ which happens to be a JSON object], and compares both the hashes to decide what to do next.

For server i am using Python and browser client is in Javascript.

For me the hashes generated in both cases do not match. Here's my code:

Python:

>>> import hashlib
>>> import json

>>> a = {"candidate" : 5, "data": 1}
>>> a = json.dumps(a, sort_keys = True).encode("utf-8")
>>> hashlib.md5(a).hexdigest()
>>> 12db79ee4a76db2f4fc48624140adc7e

JS: I am using md5 for hashing in browser

> var hash = require("md5")
> var data = {"candidate":5, "data":1}
> data = JSON.stringify(data)
> md5(data)
> 92e99f0a99ad2a3b5e02f717a2fb83c2

What is it that i am doing wrong?

3
  • 1
    This library gives the correct hash (Python's) Commented Jul 16, 2018 at 10:55
  • Note however that converting JS to string is not the best approach: white spaces, formatting, even the order of 2 keys could be different yet semantically the docs are the same. I have not found a lib for python yet but for javascript there is github.com/fraunhoferfokus/JSum. For Python, github.com/schollii/sandals/blob/master/json_sem_hash.py. Commented Jul 10, 2020 at 15:38
  • 1
    md5 is not secure. Better to use something modern like sha384 or sha256. Commented Jul 13, 2023 at 19:30

3 Answers 3

48

You're assuming that both languages generate JSON that looks identical.

>>> json.dumps({"candidate" : 5, "data": 1}, sort_keys=True)
'{"candidate": 5, "data": 1}'

js> JSON.stringify({"candidate" : 5, "data": 1})
"{\"candidate\":5,\"data\":1}"

Fortunately, they can.

>>> a = json.dumps({"candidate" : 5, "data": 1}, sort_keys=True, indent=2)
'{\n  "candidate": 5,\n  "data": 1\n}'

js> var a = JSON.stringify({"candidate" : 5, "data": 1}, null, 2)
"{\n  \"candidate\": 5,\n  \"data\": 1\n}"

And now the hashes would be same as well.

Python:

>>> hashlib.md5(a.encode("utf-8")).hexdigest()
>>> d77982d217ec5a9bcbad5be9bee93027

JS:

>>> md5(a)
>>> d77982d217ec5a9bcbad5be9bee93027
Sign up to request clarification or add additional context in comments.

2 Comments

I can't figure out how to hash from my JS REPL.
I believe this answer is invalid, because in the Python snipet ordering is being enforced, while in the JavaScript version it is arbitrary, in fact, it's not guaranteed to follow any specific ordering by the docs: Note: Properties of non-array objects are not guaranteed to be stringified in any particular order. Do not rely on ordering of properties within the same object within the stringification. The same example with the following would fail: {"data" : 5, "candidate": 1}
2

The difference is that json.dumps applies some minor pretty-printing by default but JSON.stringify does not, that's why hashes are not the same.
  Python:

 >>> import json
 >>> json.dumps({"candidate" : 5, "data": 1})
     '{"candidate": 5, "data": 1}'

  Javacript:

 > JSON.stringify({"candidate" : 5, "data": 1})
   '{"candidate":5,"data":1}'

But with some modification, we can generate the same hash. There are two ways for it:-

  1. Modifying javascript JSON string to make it equivalent to a python JSON string.
    Python:
    >>> import json,hashlib
    >>> a = json.dumps({"candidate" : 5, "data": 1}, sort_keys=True)
    >>> hashlib.md5(a.encode("utf-8")).hexdigest()
        '12db79ee4a76db2f4fc48624140adc7e'
    
    Javacript:
    > const Crypto = require("crypto-js")
      undefined
    > const a = JSON.stringify({"candidate" : 5, "data": 1}).replaceAll(":", ": ").replaceAll(",", ", ")
      undefined
    > Crypto.MD5(a).toString(Crypto.enc.Hex)
      '12db79ee4a76db2f4fc48624140adc7e'
    
  2. Modifying python JSON string to make it equivalent to a javascript JSON string.
    Python:
    >>> import json,hashlib
    >>> a = json.dumps({"candidate" : 5, "data": 1}, separators=(',', ':'))
    >>> hashlib.md5(a.encode("utf-8")).hexdigest()
        '92e99f0a99ad2a3b5e02f717a2fb83c2'
    
    Javacript:
    > const Crypto = require("crypto-js")
      undefined
    > const a = JSON.stringify({"candidate" : 5, "data": 1})
      undefined
    > Crypto.MD5(a).toString(Crypto.enc.Hex)
      '92e99f0a99ad2a3b5e02f717a2fb83c2'
    

    Note:- To run javascript code, crypto-js npm pkg should be installed as same location where you started the node shell.

1 Comment

Super helpful for me, the crux of my issue was the separators option. Thank you!
1

I created this python module merkle-json which can generate a unique hash no matter the order of the list or the key inside a dict or json object. It offers also some flexible and additional configurations where you can ignore keys, or null values based on your needs, check the docs for more.

use it like this:

from merkle_json import MerkleJson

mj = MerkleJson()

obj = {
    'keyC': [3,4],
    'keyA': 2,
    'keyB': 4,
    'keyD': 1,
}
mjHash = mj.hash(obj)
print(mjHash) # '7001bd2b415e6a624a23d7bc7c249b21'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.