base 64 encoding in javascript

Question

Below is a base 64 image encoding function that I got from Philippe Tenenhaus (http://www.philten.com/us-xmlhttprequest-image/).

It's very confusing to me, but I'd love to understand.

I think I understand the bitwise & and | , and moving through byte position with << and >>.

I'm especially confused at those lines : ((byte1 & 3) << 4) | (byte2 >> 4); ((byte2 & 15) << 2) | (byte3 >> 6);

And why it still using byte1 for enc2, and byte2 for enc3. And the purpose of enc4 = byte3 & 63; ...

Can someone could explain this function.

function base64Encode(inputStr) 
            {
               var b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
               var outputStr = "";
               var i = 0;

               while (i < inputStr.length)
               {
                   //all three "& 0xff" added below are there to fix a known bug 
                   //with bytes returned by xhr.responseText
                   var byte1 = inputStr.charCodeAt(i++) & 0xff;
                   var byte2 = inputStr.charCodeAt(i++) & 0xff;
                   var byte3 = inputStr.charCodeAt(i++) & 0xff;

                   var enc1 = byte1 >> 2;
                   var enc2 = ((byte1 & 3) << 4) | (byte2 >> 4);

                   var enc3, enc4;
                   if (isNaN(byte2))
                   {
                       enc3 = enc4 = 64;
                   }
                   else
                   {
                       enc3 = ((byte2 & 15) << 2) | (byte3 >> 6);
                       if (isNaN(byte3))
                       {
                           enc4 = 64;
                       }
                       else
                       {
                           enc4 = byte3 & 63;
                       }
                   }

                   outputStr += b64.charAt(enc1) + b64.charAt(enc2) + b64.charAt(enc3) + b64.charAt(enc4);
                } 

                return outputStr;
            }

Christopher Gress · Accepted Answer · 2013-09-27 19:43:33Z

1

It probably helps to understand what Base64 encoding does. It converts 24 bits in groupings of 8 bits into groupings of 6 bits. (http://en.wikipedia.org/wiki/Base64)

So enc1, is the first 6-bits which are the first 6-bits of the first Byte.

enc2, is the next 6-bits, the last 2-bits of the first Byte and first 4-bits of the second Byte. The bitwise and operation byte1 & 3 targets the last 2 bits in the first Byte. So,

XXXXXXXX & 00000011 = 000000XX

It is then shifted to the left 4 bits.

000000XX << 4 = 00XX0000.

The byte2 >> 4 performs a right bit shift, isolating the first 4 bits of the second Byte, shown below

YYYYXXXX >> 4 = 0000YYYY

So, ((byte1 & 3) << 4) | (byte2 >> 4) combines the results with a bitwise or

00XX0000 | 0000YYYY = 00XXYYYY

enc3, is the last 4-bits of the second byte and the first 2-bits of the 3rd Byte.

enc4 is the last 6-bits of the 3rd Byte.

charCodeAt returns a Unicode code point which is a 16-bit value, so it appears there is an assumption that the relevant information is only in the low 8-bits. This assumption makes me wonder if there still is a bug in the code. There could be some information lost as a result of this assumption.

edited Sep 27, 2013 at 19:43

answered Sep 27, 2013 at 16:58

Christopher Gress

1186 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

trogne Over a year ago

Great! I almost understand all. Why '(byte1 & 3) << 4' and '(byte2 & 15) << 2' ? I don't understand the why 4 and 2.

trogne Over a year ago

OK I see! (byte1 & 3) becoming the first 2 bits, then more powerful, so x2x2x2x2. But : 'all three "& 0xff" added below are there to fix a known bug' : what is the bug ? Please tell me if I'm right : The function only reads the last octets of yeach byte (0xFF = 00000000000000000000000011111111). When reading more, there's a bug.

Christopher Gress Over a year ago

I am not 100% sure what the bug was, but if I were to guess it would be some sort of type conversion bug. it looks like it is ensuring that it is only receiving one byte, since 0xFF = 11111111 in binary.

ShawnS Over a year ago

The bug is if the string length did not divide by 3 the remaining bytes would = 0 and the isNaN test would always be false simple fix is to test if the increment of(i) is greater then the string length.. var byte2 = (i < inputStr.length)? inputStr.charCodeAt(i++) & 0xff : undefined; var byte3 = (i < inputStr.length)? inputStr.charCodeAt(i++) & 0xff : undefined;

Collectives™ on Stack Overflow

base 64 encoding in javascript

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related