6

I'm using hashlib sha256 (python) to prove two inputs.

My hypothesis was that null characters and empty strings will give the same hash.

Here's my code.

from hashlib import sha256
print(sha256(b'\x00').hexdigest(),end='\n\n')
print(sha256(b'').hexdigest())

And it gave output.

6e340b9cffb37a989ca544e6bb780a2c78901d3fb33738768511a30617afa01d

e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Why did they not give the same result?

Is there a relation with the C language string format in which the string always ends with null? So when I hash null, it will hash double null?

4
  • Your hypothesis is wrong. Empty string has length 0. Zero byte has length 1. Padding and adding length in SHA prevents it from creating same result. It is not related with C string, because you have binary data here. Commented Nov 10, 2021 at 10:59
  • 1
    I call it my billion-dollar mistake, Tony Hoare. You confuse a null string and an empty string. Null means the pointer has adress 0, and the empty string means the string has nothing in it - has 0 length. NIST publishes even zero length sample codes in test vectors Commented Nov 10, 2021 at 11:20
  • 2
    @kelalaka a Python string containing a single byte of value zero has naught to do with a pointer whose address is 0. It rather just encodes a byte array of length 1, containing a zero-byte. Commented Nov 10, 2021 at 11:24
  • 1
    @Morrolan Well, I've mostly talked about the common misconception. That is the point, why a string with value 0 should be null! Commented Nov 10, 2021 at 11:26

1 Answer 1

4

An empty string is (or, strictly speaking, "encodes to") a byte array of length zero, containing no bytes. You can observe this e.g. as follows, using Python:

>>> list(bytes("", 'ascii'))
[]

A string consisting of a single zero-byte on the other hand is a byte array of length one, containing a single byte of value zero:

>>> list(bytes("\x00", 'ascii'))
[0]

As such these two inputs are different, and will hash to different values.

As was mentioned in comments above, there is no relation to how some languages such as C represent strings, using a zero-byte to indicate their end.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.