5

Based on various recommendations such as What data type to use for hashed password field and what length?, I could store md5 as either CHAR(32) or BINARY(16). But when I do so using BINARY(16), the stored value is identical to the first 16 characters of the CHAR(32) stored results as well as the first 16 characters of the SELECT MD5() results. What are the significance of the later 16 characters, and is their lack of presence in the binary column resulting in lost data?

CREATE  TABLE test (id INT NOT NULL AUTO_INCREMENT, value VARCHAR(6), md5_char CHAR(32) NOT NULL, md5_binary BINARY(16) NOT NULL, PRIMARY KEY (id)) ENGINE = InnoDB;
INSERT INTO test(value,md5_char,md5_binary) VALUES("one!",md5("one!"),md5("one!"));
INSERT INTO test(value,md5_char,md5_binary) VALUES("two%",md5("two%"),md5("two%"));
INSERT INTO test(value,md5_char,md5_binary) VALUES("three~",md5("three~"),md5("three~"));
SELECT value,md5(value),md5_char,md5_binary FROM test;
DROP TABLE test;

+--------+----------------------------------+----------------------------------+------------------+
| value  | md5(value)                       | md5_char                         | md5_binary       |
+--------+----------------------------------+----------------------------------+------------------+
| one!   | 633c8403325f1cf963809e6eb224d77e | 633c8403325f1cf963809e6eb224d77e | 633c8403325f1cf9 |
| two%   | 48bbec047b4451a2018e0f652807b7d0 | 48bbec047b4451a2018e0f652807b7d0 | 48bbec047b4451a2 |
| three~ | fee453bb4eb68dcdfee07575e75c8cc5 | fee453bb4eb68dcdfee07575e75c8cc5 | fee453bb4eb68dcd |
+--------+----------------------------------+----------------------------------+------------------+

1 Answer 1

8

Currently you are losing half of the checksum when using BINARY(16). When you store an MD5 checksum in BINARY(16) you should store it as binary data, not encoded in hexadecimal. That is:

INSERT INTO test (md5_binary) VALUES(UNHEX(md5("one!")));

You can use the HEX function to encode it into hex again if you want to eye-ball compare it with another checksum:

SELECT HEX(md5_binary) FROM test;

The benefit of using BINARY to store the checksum instead of hexadecimal text is that half the storage is needed.

Sign up to request clarification or add additional context in comments.

9 Comments

Thanks Joni, Similar, Blowfish-based bcrypt requires CHAR(60) to store it. Could I use BINARY(30) if I use UNHEX()?
The storage benefit is unrelated to the data type. You could store hexadecimal in binary, and un-hexed data in a char column.
But will binary always require exactly half the digits of char?
Just realized your post states "half the storage is needed". Thanks
@Andomar, storing binary data in a CHAR column is not recommended: the database will then interpret the data as having a particular character encoding, and you may corrupt the data unless you know what you are doing (i.e. use a single-byte encoding and a binary collation)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.