6

I'm having trouble converting text to Base64 string in Java (Android) and .NET (Visual Basic). The plain (readable) form of ASCII characters convert fine. But when it comes to special characters (characters whose code is greater than 128), they're creating trouble for me.

For example I try converting a character code whose ASCII value is 65 (the character "A").

My Java code is:

char a = 65;
String c = String.valueOf(a); 
byte bt[] = c.getBytes();               
String result = Base64.encodeToString(bt, Base64.DEFAULT);

And my .NET code is:

Dim c As String = Chr(65)
Dim result as String = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(c))

These both return the same result: "QQ==". This is fine. But when I try converting a special character, for example a character code 153. Then it returns different results.

char a = 153;
String c = String.valueOf(a);               
byte bt[] = c.getBytes();               
String result = Base64.encodeToString(bt, Base64.DEFAULT);

This returns "wpk="

And my same .NET code:

Dim c As String = Chr(153) 
Dim result as String = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(c))

This returns "4oSi"

This is so strange. What's wrong here. I'm using the native Base64 libraries on both platforms. Is something wrong with my code?

3
  • 6
    You're using UTF8 for the C# one, and Java uses unicode. Try using unicode in both. Commented Oct 22, 2012 at 18:51
  • BTW - what do you think the character 153 represents? It is non-printable in Unicode. It is trademark (TM) in ISO-8859-1. Commented Oct 22, 2012 at 19:38
  • I tried using Unicode on both, still there are differences. The .NET now returns "IiE=" whereas Java still returns "wpk=". Yes 153 might be the trademark sign. But my code is simply trying to perform an encryption by scrambling up character codes so they vary anywhere between 0-255. Then to safely transmit over the internet I need to convert it to Base64. Commented Oct 23, 2012 at 4:26

1 Answer 1

11

Since the data that you are encoding is encrypted data - random data where any byte can be from 0 to 255 and, in its encrypted state, has no character or text meaning, you need to treat this information as -lets call it - true binary data. Both Java and .NET have full support for true binary data via their respective byte array primitives.

As you know, base64 encoding is the process of converting true binary data (with a range of 0 to 255) into a slightly larger array of binary data (where each byte is guaranteed to have the same value as an ASCII printable character somewhere between 32 and 126). Let's call this encoded binary. The encoded binary can then safely be converted to text because virtually every known character set agrees on the printable ASCII character set (32 to 126).

So the main problem with both the Java and VB.NET snippets is that you are attempting to use text primitives - char and String in Java; String in VB.NET to store true binary data. Once you do that it's too late. There is no way to reliably convert it back to byte arrays because the text primitives are simply not designed to safely store and retrieve binary data. For more on why this is so, please read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Fortunately the fix is simple. For Java, don't use char and String to store binary data. Put the data directly into a byte array. Try the following:

  byte [] bt = new byte[1];
  bt[0] = (byte) 153;
  String result = Base64.encodeToString(bt, Base64.DEFAULT);

I get mQ==

The fix is conceptually the same in VB.NET. Don't use String. Use a byte array.

    Dim bytes() As Byte = New Byte() {153}
    Dim result As String = Convert.ToBase64String(bytes)

Again - the answer is mQ==

Finally, after the encoding, it's perfectly fine to use Strings. Your characters are in the ASCII subset and any conversion between String and byte array will not corrupt data because all character sets agree on the ASCII subset.

Remember you will have the same issue going in the reverse order - decoding. You will be decoding to a byte array, at which point you will be back to true binary. From this point on the data must never be stored as a string - until you are finished with it - ex. decrypting it back to the original clear text.

Hope this helps.

Sign up to request clarification or add additional context in comments.

4 Comments

System.Text.Encoding.UTF8.GetBytes(c) returns a byte array in .NET. What you write about java sounds correct.
@Guido your code seems to work. I'm not really good at Java so I guess I messed up on conversion of string to byte to base64. What I'm trying to accomplish is that I have an android app which works on XML data. So its all plain text data. Then I've set up a web server with ASP.NET on it. My android app is to encrypt the XML file data using my own encryption code, that's why the characters like 153 also come in to play. Then convert it to Base64 and transmit it. Then on web server based on .NET will receive it and decode from Base64 and decrypt it.
@FarazAzhar - thanks for the clarification. I have updated the answer based on a better understanding of the problem you are trying to solve.
@GuidoSimone That really clarifies it. I get it. I tried storing non-printable binary data into string variables, so they somehow got lost or corrupted in those string variables resulting in awkward B64 conversions. Thanks ! And thanks for the link as well! :o)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.