1

I'm trying to convert a Base64 encoded string to text. I'm using the following code:

public static string Base64Decode(string base64EncodedData)
{
    var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
    return System.Text.Encoding.UTF8.GetString(base64EncodedBytes);
}

Somehow it does work but it puts whitespaces after each character.Furthermore, it adds an invalid character in the beginning of converted string. The content in Base64 string is an XML so when it converts it to text and puts whitespaces, the XML becomes invalid. Is there any alternative to this?

here's a sample output after conversion:

? < ? x m l  v e r s i o n = " 1 . 0 "  e n c o d i n g = " U T F - 1 6 "  s t a n d a l o n e = " n o " ? >   < I m p o r t >     < o p t i o n s >       < P r o c N a m e > E R P N u m b e r < / P r o c N a m e >       < J o b I D > A N L 0 0 1 8 5 0 < / J o b I D >     < / o p t i o n s >     < R o w >       < D o c I d  / >       < E R P N u m b e r  / >     < / R o w >   < / I m p o r t > 
6
  • 1
    are you sure the string is utf8, not utf16? please provide a minimal reproducible example of the base64 string. Commented Oct 2, 2018 at 13:18
  • I'm assuming the incoming base64EncodedData is actually UTF-16 encoded. Try using System.Text.Encoding.UTF16.GetString instead. Commented Oct 2, 2018 at 13:18
  • You need two things to interpret bytes as text: 1) the bytes, 2) the character encoding. Ask the sender or perhaps that has already been communicated to you via a specification, standard, convention, etc. Commented Oct 2, 2018 at 13:46
  • 2
    @TomBlodget - since it's actually XML (probably with a byte order mark at the beginning), OP could return a byte array, put it into a MemoryStream, them use XmlReader.Create(Stream) to parse the XML. I think (but have not checked) that the XmlReader will interpret the encoding correctly. Or if there is indeed a BOM, then OP can use new StreamReader(Stream, true) to detect it. Commented Oct 2, 2018 at 13:48
  • @dbc Yes, good point. Knowing that the bytes are an XML document could suffice because the XML standard has an algorithm to determine the character encoding. Commented Oct 2, 2018 at 13:50

2 Answers 2

6

It looks like the original binary data is string converted to bytes using UTF-16, which matches the encoding="UTF-16" part of the text. You need to use the right encoding when converting the binary data back to a string:

return Encoding.Unicode.GetString(base64EncodedBytes);

That's assuming you can't change what's producing the data in the first place. If you can change that to use UTF-8 instead, you'll end up with half as much data if the text is all ASCII characters...

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. It was a silly mistake on my part, I didn't notice the encoding in the source. Changed it to UTF8 and its working now.
1

As Jon Skeet explained in his answer, the string appears to be encoded in UTF-16 not UTF-8.

You also wrote

Furthermore, it adds an invalid character in the beginning of converted string.

This invalid character is almost certainly a byte order mark, a small prefatory sequence of bytes that indicates the specific encoding used in the stream. Given its presence, you can configure a StreamReader to detect the encoding specified by using the new StreamReader(Stream, true) constructor:

public static string Base64Decode(string base64EncodedData)
{
    var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
    using (var reader = new StreamReader(new MemoryStream(base64EncodedBytes), true))
    {
        return reader.ReadToEnd();
    }
}

Note that the StreamReader will consume the byte order mark during processing so it is not included in the returned string.

Alternatively, since your base64 data is actually XML, and XML contains its own encoding declaration, you could extract the byte array and parse it directly using an XmlReader:

public static XmlReader CreateXmlReaderFromBase64(string base64EncodedData, XmlReaderSettings settings = null)
{
    var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
    return XmlReader.Create(new MemoryStream(base64EncodedBytes), settings);
}

According to the docs, XmlReader.Create(Stream) will detect encoding as required:

The XmlReader scans the first bytes of the stream looking for a byte order mark or other sign of encoding. When encoding is determined, the encoding is used to continue reading the stream, and processing continues parsing the input as a stream of (Unicode) characters.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.