11

I am trying to read a String in UTF-16 encoding scheme and perform MD5 hashing on it. But strangely, Java and C# are returning different results when I try to do it.

The following is the piece of code in Java:

public static void main(String[] args) {
    String str = "preparar mantecado con coca cola";
    try {
        MessageDigest digest = MessageDigest.getInstance("MD5");
        digest.update(str.getBytes("UTF-16"));
        byte[] hash = digest.digest();
        String output = "";
        for(byte b: hash){
            output += Integer.toString( ( b & 0xff ) + 0x100, 16).substring( 1 );
        }
        System.out.println(output);
    } catch (Exception e) {

    }
}

The output for this is: 249ece65145dca34ed310445758e5504

The following is the piece of code in C#:

   public static string GetMD5Hash()
        {
            string input = "preparar mantecado con coca cola";
            System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
            byte[] bs = System.Text.Encoding.Unicode.GetBytes(input);
            bs = x.ComputeHash(bs);
            System.Text.StringBuilder s = new System.Text.StringBuilder();
            foreach (byte b in bs)
            {
                s.Append(b.ToString("x2").ToLower());
            }
            string output= s.ToString();
            Console.WriteLine(output);
        }

The output for this is: c04d0f518ba2555977fa1ed7f93ae2b3

I am not sure, why the outputs are not the same. How do we change the above piece of code, so that both of them return the same output?

1
  • Compare your byte arrays first. If they mismatch in even a single bit, the hashes are completely different. There may a BOM or whatever in the UTF-16 encoding. It may be little or big endian, or whatever. Commented Jan 25, 2011 at 12:32

3 Answers 3

35

UTF-16 != UTF-16.

In Java, getBytes("UTF-16") returns an a big-endian representation with optional byte-ordering mark. C#'s System.Text.Encoding.Unicode.GetBytes returns a little-endian representation. I can't check your code from here, but I think you'll need to specify the conversion precisely.

Try getBytes("UTF-16LE") in the Java version.

Sign up to request clarification or add additional context in comments.

2 Comments

It's worth noting that if you look at the output in eclipse, it still doesn't match what Visual Studio shows you. But strangely it does work...
2015, Java 8.0 * .NET 4.0.x tests based on Polish language, seems be OK like Yoy write. Bytes in both languages are identical, and have not BOM prefix. Next important field for tests: Java arithmetic accept overflow silently (good for hash), C# by default not
5

The first thing I can find, and this might not be the only problem, is that C#'s Encoding.Unicode.GetBytes() is littleendian, while Java's natural byte order is bigendian.

Comments

0

You could use the System.Text.Enconding.Unicode.GetString(byte[]) to convert back from byte to string. In this way you're sure that all happens in Unicode encoding.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.