2

I typed this into the nodejs console

new Buffer(new Buffer([0xde]).toString('utf8'), 'utf8')

and it prints out

<Buffer ef bf bd>

After looking at the docs it seems that this would produce an identical buffer. I'm creating a utf8 encoded string from a buffer whose contents consist of one byte (0xde) then using that utf8 encoded string to create a buffer. Am I missing something here?

1 Answer 1

4

For encodings that can be multi-byte, you cannot expect to get the same bytes back that you started with in all cases. In the case of UTF-8, some characters require more than one byte to be represented properly.

In your example, 0xde exceeds 0x7f which is the largest value for a character that can be represented by a single byte. So when you then call .toString('utf8'), node sees that it only has one byte and instead returns the UTF-8 character \uFFFD (0xef, 0xbf, 0xbd in hex) which is used to denote an unknown/unrepresentable character. Then reading back in this "replacement character" value back into a new Buffer is no problem, as it is a valid UTF-8 character.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the answer. For my purposes it sounds like I need to use another type of string encoding option like hex or base64.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.