0

I'm noticing a discrepancy between a javascript function run in Node and a javascript function in a UDF in BigQuery.

I am running the following in BigQuery:

CREATE TEMP FUNCTION testHash(md5Bytes BYTES)
RETURNS BYTES 
LANGUAGE js AS """
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
return md5Bytes
""";

SELECT TO_HEX(testHash(MD5("test_phrase")));

and the output ends up being cb5012e39277d48ef0b5c88bded48591. (This is incorrect)

Running the same code in Node gets cb5012e39277348eb0b5c88bded48591 (which is the expected value) - notice how 2 of the characters are different.

I've narrowed down the issue to the fact that BigQuery doesn't actually apply the bitwise operators, since the output of not running these bitwise operators in Node is the same incorrect output from BQ:

md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;

Any ideas why the bitwise operators are not being applied to the md5Bytes input to the UDF?

1
  • string in Javascript is immutable, so md5Bytes[6] &= 0x0f doesn't change md5Bytes at all and you get the same result as input. I guess. Commented Jan 20, 2023 at 5:40

1 Answer 1

1

Ths bitwise operations in JavaScript UDF in BigQuery can only be applied to most significant 32 bits as mentioned in the limitations of the JavaScript UDF in this documentation. The MD5 is a hash function algorithm that takes an input and convert it into fixed-length messages of 16 bytes which is equivalent to 128 bits. Since the JavaScript UDF bitwise operations can only be applied to 32 bits that’s why it is giving unexpected output.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.