Bitwise XOR in Javascript compared to C++

Question

I am porting a simple C++ function to Javascript, but it seems I'm running into problems with the way Javascript handles bitwise operators.

In C++:

AnsiString MyClass::Obfuscate(AnsiString source)
{
    int sourcelength=source.Length();
    for(int i=1;i<=sourcelength;i++)
    {
        source[i] = source[i] ^ 0xFFF;
    }
    return source;
}

Obfuscate("test") yields temporary intvalues

-117, -102, -116, -117

Obfuscate ("test") yields stringvalue

‹šŒ‹

In Javascript:

function obfuscate(str) 
{
    var obfuscated= "";
    for (i=0; i<str.length;i++) {

        var a = str.charCodeAt(i);                 
        var b = a ^ 0xFFF;
        obfuscated= obfuscated+String.fromCharCode(b);
    }
    return obfuscated;
}

obfuscate("test") yields temporary intvalues

3979 , 3994 , 3980 , 3979

obfuscate("test") yields stringvalue

ྋྚྌྋ

Now, I realize that there are a ton of threads where they point out that Javascript treats all numbers as floats, and bitwise operations involve a temporary cast to 32bit int.

It really wouldn't be a problem except for that I'm obfuscating in Javascript and reversing in C++, and the different results don't really match.

How do i tranform the Javascript result into the C++ result? Is there some simple shift available?

This probably isn't the problem, but you're falling prey to The Horror of Implicit Globals: You need to declare your i variable. — T.J. Crowder
– T.J. Crowder, Commented Jul 19, 2012 at 14:42

Esailija · Accepted Answer · 2012-07-19 15:09:30Z

4

Working demo

Judging from the result that xoring 116 with 0xFFF gives -117, we have to emulate 2's complement 8-bit integers in javascript:

function obfuscate(str) 
{
    var bytes = [];
    for (var i=0; i<str.length;i++) {
        bytes.push( ( ( ( str.charCodeAt(i) ^ 0xFFF ) & 0xFF ) ^ 0x80 ) -0x80 );
    }
    return bytes;
}

Ok these bytes are interpreted in windows cp 1252 and if they are negative, probably just subtracted from 256.

var ascii = [
    0x0000,0x0001,0x0002,0x0003,0x0004,0x0005,0x0006,0x0007,0x0008,0x0009,0x000A,0x000B,0x000C,0x000D,0x000E,0x000F
    ,0x0010,0x0011,0x0012,0x0013,0x0014,0x0015,0x0016,0x0017,0x0018,0x0019,0x001A,0x001B,0x001C,0x001D,0x001E,0x001F
    ,0x0020,0x0021,0x0022,0x0023,0x0024,0x0025,0x0026,0x0027,0x0028,0x0029,0x002A,0x002B,0x002C,0x002D,0x002E,0x002F
    ,0x0030,0x0031,0x0032,0x0033,0x0034,0x0035,0x0036,0x0037,0x0038,0x0039,0x003A,0x003B,0x003C,0x003D,0x003E,0x003F
    ,0x0040,0x0041,0x0042,0x0043,0x0044,0x0045,0x0046,0x0047,0x0048,0x0049,0x004A,0x004B,0x004C,0x004D,0x004E,0x004F
    ,0x0050,0x0051,0x0052,0x0053,0x0054,0x0055,0x0056,0x0057,0x0058,0x0059,0x005A,0x005B,0x005C,0x005D,0x005E,0x005F
    ,0x0060,0x0061,0x0062,0x0063,0x0064,0x0065,0x0066,0x0067,0x0068,0x0069,0x006A,0x006B,0x006C,0x006D,0x006E,0x006F
    ,0x0070,0x0071,0x0072,0x0073,0x0074,0x0075,0x0076,0x0077,0x0078,0x0079,0x007A,0x007B,0x007C,0x007D,0x007E,0x007F
];

var cp1252 = ascii.concat([
    0x20AC,0xFFFD,0x201A,0x0192,0x201E,0x2026,0x2020,0x2021,0x02C6,0x2030,0x0160,0x2039,0x0152,0xFFFD,0x017D,0xFFFD
    ,0xFFFD,0x2018,0x2019,0x201C,0x201D,0x2022,0x2013,0x2014,0x02DC,0x2122,0x0161,0x203A,0x0153,0xFFFD,0x017E,0x0178
    ,0x00A0,0x00A1,0x00A2,0x00A3,0x00A4,0x00A5,0x00A6,0x00A7,0x00A8,0x00A9,0x00AA,0x00AB,0x00AC,0x00AD,0x00AE,0x00AF
    ,0x00B0,0x00B1,0x00B2,0x00B3,0x00B4,0x00B5,0x00B6,0x00B7,0x00B8,0x00B9,0x00BA,0x00BB,0x00BC,0x00BD,0x00BE,0x00BF
    ,0x00C0,0x00C1,0x00C2,0x00C3,0x00C4,0x00C5,0x00C6,0x00C7,0x00C8,0x00C9,0x00CA,0x00CB,0x00CC,0x00CD,0x00CE,0x00CF
    ,0x00D0,0x00D1,0x00D2,0x00D3,0x00D4,0x00D5,0x00D6,0x00D7,0x00D8,0x00D9,0x00DA,0x00DB,0x00DC,0x00DD,0x00DE,0x00DF
    ,0x00E0,0x00E1,0x00E2,0x00E3,0x00E4,0x00E5,0x00E6,0x00E7,0x00E8,0x00E9,0x00EA,0x00EB,0x00EC,0x00ED,0x00EE,0x00EF
    ,0x00F0,0x00F1,0x00F2,0x00F3,0x00F4,0x00F5,0x00F6,0x00F7,0x00F8,0x00F9,0x00FA,0x00FB,0x00FC,0x00FD,0x00FE,0x00FF
]);

function toStringCp1252(bytes){
    var byte, codePoint, codePoints = [];
    for( var i = 0; i < bytes.length; ++i ) {
        byte = bytes[i];
        if( byte < 0 ) {
            byte = 256 + byte;
        }
        codePoint = cp1252[byte];
        codePoints.push( codePoint );

    }

    return String.fromCharCode.apply( String, codePoints );
}

Result

toStringCp1252(obfuscate("test"))
//"‹šŒ‹"

edited Jul 19, 2012 at 15:09

answered Jul 19, 2012 at 14:57

Esailija

140k24 gold badges280 silver badges328 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

petrobrush Over a year ago

While this is a correct and usable answer, I wonder how I can expand this, for several XOR operations. See this jfiddle.

Esailija Over a year ago

@petrobrush Does this give expected results? jsfiddle.net/5HSQ3/8, it results in undefined CP1252 characters, which my Cp1252 function replaces with the 0xFFFD (�) unicode replacement character. I'd need to see the original results for this to know what to do with the unusde CP1252 characters. See en.wikipedia.org/wiki/CP-1252 for the grey boxes

Esailija Over a year ago

@petrobrush here's the same jsfiddle that logs intermediate byte results for multiple xor operations jsfiddle.net/5HSQ3/9

Esailija Over a year ago

@petrobrush basically what I'm doing is the same C operations, after which I turn the result into 2's complement signed 8-bit integer. I should have made that an intermediate step.

Esailija Over a year ago

I've modified jsfiddle.net/5HSQ3/11 according to this: According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused, however the Windows API MultiByteToWideChar maps these to the corresponding C1 control codes. They are now control characters according to that and thus won't be visible in a string

Mike Seymour · Accepted Answer · 2012-07-19 14:50:31Z

I'm guessing that AnsiString contains 8-bit characters (since the ANSI character set is 8 bits). When you assign the result of the XOR back to the string, it is truncated to 8 bits, and so the resulting value is in the range [-128...127].

(On some platforms, it could be [0..255], and on others the range could be wider, since it's not specified whether char is signed or unsigned, or whether it's 8 bits or larger).

Javascript strings contain unicode characters, which can hold a much wider range of values, the result is not truncated to 8 bits. The result of the XOR will have a range of at least 12 bits, [0...4095], hence the large numbers you see there.

Assuming the original string contains only 8-bit characters, then changing the operation to a ^ 0xff should give the same results in both languages.

Evan Teran · Accepted Answer · 2012-07-19 14:55:47Z

1

I assume that AnsiString is in some form, an array of chars. And this is the problem. in c, char can typically only hold 8-bits. So when you XOR with 0xfff, and store the result in a char, it is the same as XORing with 0xff.

This is not the case with javascript. JavaScript using Unicode. This is demonstrated by looking at the integer values:

-117 == 0x8b and 3979 == 0xf8b

I would recommend XORing with 0xff as this will work in both languages. Or you can switch your c++ code to use Unicode.

edited Jul 19, 2012 at 14:55

answered Jul 19, 2012 at 14:49

Evan Teran

91k34 gold badges188 silver badges246 bronze badges

1 Comment

Viktor Latypov Over a year ago

It seems like an old C++ Builder class library with some custom AnsiString/WideString classes. It's definitely about the size of char, I'm just not sure how much bits are there and I don't have an access to C++ Builder.

Viktor Latypov · Accepted Answer · 2012-07-19 14:56:06Z

0

First, convert your AnsiString to wchar_t*. Only then obfuscate its individual characters:

AnsiString MyClass::Obfuscate(AnsiString source)
{
   /// allocate string
   int num_wchars = source.WideCharBufSize();
   wchar_t* UnicodeString = new wchar_t[num_wchars];
   source.WideChar(UnicodeString, source.WideCharBufSize());

   /// obfuscate individual characters
   int sourcelength=source.Length();
   for(int i = 0 ; i < num_wchars ; i++)
   {
       UnicodeString[i] = UnicodeString[i] ^ 0xFFF;
   }

   /// create obfuscated AnsiString
   AnsiString result = AnsiString(UnicodeString);

   /// delete tmp string
   delete [] UnicodeString;

   return result;
}

Sorry, I'm not an expert on C++ Builder, but my point is simple: in JavaScript you have WCS2 symbols (or UTF-16), so you have to convert AnsiString to wide chars first.

Try using WideString instead of AnsiString

edited Jul 19, 2012 at 14:56

answered Jul 19, 2012 at 14:49

Viktor Latypov

14.5k3 gold badges44 silver badges57 bronze badges

1 Comment

Evan Teran Over a year ago

I doubt this works, because the unicode string will have 12-bit characters which you are stuffing into an array of 8-bit values...

T.J. Crowder · Accepted Answer · 2012-07-19 14:57:53Z

I don't know AnsiString at all, but my guess is this relates to the width of its characters. Specifically, I suspect they're less than 32 bits wide, and of course in bitwise operations, the width of what you're operating on matters, particularly when dealing with 2's complement numbers.

In JavaScript, your "t" in "test" is character code 116, which is b00000000000000000000000001110100. 0xFFF (4095) is b00000000000000000000111111111111, and the result you're getting (3979) is b00000000000000000000111110001011. We can readily see that you're getting the right result for the XOR:

116  = 00000000000000000000000001110100
4095 = 00000000000000000000111111111111
3979 = 00000000000000000000111110001011

So I'm thinking you're getting some truncation or similar in your C++ code, not least because -117 is b10001011 in eight-bit 2's complement...which is exactly what we see as the last eight bits of 3979 above.

Collectives™ on Stack Overflow

Bitwise XOR in Javascript compared to C++

5 Answers 5

5 Comments

Comments

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related