0

What I'm trying to do is hash a message but it contains a Unicode charset.

What I've done so far in NodeJS :-

const CryptoJS = require('crypto-js');

let message = '\x1aSmartCash Signed Message:\n\xabCTxIn(COutPoint(7bb8ad134928a003752beb098471af5a66fc5475ff96b5ba4c2e1c4cbac3aa13, 0), scriptSig=)000000000002c5c2ef4afc588492773e6bbb18e4f2374b6dc159ef257667bd881667906410';
let hash = CryptoJS.SHA256(message).toString();

console.log('hash', hash); // 989c004534c6962293c95a9438bdb926c92c1d8b4dec0f4f1e535defa171e5fe

In Python code the code :-

import hashlib

def to_bytes(something, encoding='utf8'):
    """
    cast string to bytes() like object, but for python2 support it's bytearray copy
    """
    if isinstance(something, bytes):
        return something
    if isinstance(something, str):
        return something.encode(encoding)
    elif isinstance(something, bytearray):
        return bytes(something)
    else:
        raise TypeError("Not a string or bytes like object")

def sha256(x):
    x = to_bytes(x, 'utf8')
    return bytes(hashlib.sha256(x).digest())

def Hash_Sha256(x):
    x = to_bytes(x, 'utf8')
    out = bytes(sha256(x))
    return out

message = b'\x1aSmartCash Signed Message:\n\xabCTxIn(COutPoint(7bb8ad134928a003752beb098471af5a66fc5475ff96b5ba4c2e1c4cbac3aa13, 0), scriptSig=)000000000002c5c2ef4afc588492773e6bbb18e4f2374b6dc159ef257667bd881667906410'

print('hash', Hash_Sha256(message).hex()) # 947270b0b8041a92ba82ef37661b692a4a150532b88de59bf95e965ceb5c07f8

As you can see the output from Python is different than NodeJS

The desired output should be the same as Python code 947270b0b8041a92ba82ef37661b692a4a150532b88de59bf95e965ceb5c07f8

I don't know how to hash the Unicode the problem is these characters (\x1a, \xab). If I removed the escape character the 2 hashes will be the same.

So how to hash Unicode characters in NodeJS so the output equals the hash from Python code?

1 Answer 1

1

message in the Python code is a byte string, i.e. a sequence of bytes. In particular, \xab in the message corresponds to the byte 0xab.

In the CryptoJS code, message is a string that is implicitly UTF-8 encoded in CryptoJS.SHA256(message). Here all characters beyond U+007f are represented by more than one byte. In particular, \xab in the message is encoded to 0xc2ab.
This leads to different byte sequences in the CryptoJS and Python code and therefore to different hashes.

To achieve the same encoding as the Python code, the Latin1 encoder must be applied in the CryptoJS code:

let hash = CryptoJS.SHA256(CryptoJS.enc.Latin1.parse(message)).toString(); 

With this, the CryptoJS code returns the same hash as the Python code.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.