I'm trying to convert the decimal values of unicode characters to their actual characters using C++ and I don't want to use any libraries. I was kindly given the function below by a user on StackOverflow that converts the decimal representation into a UTF 8 character.
This solved all my problems when I was testing my code on OSX, but sadly when I tested it on Windows the characters outputted where completely incorrect. I understand now that Windows uses UTF 16, which would explain why the wrong characters where outputted on Windows.
The problem is, since I didn't write the function myself, I have no idea how it works. I've tried Googling each different part of the function and I understand it's the UTF 8 encoding algorithm and I know its using bitwise operations but I don't have a clue how it works. Here's the function:
void GetUnicodeChar(unsigned int code, char chars[5]) {
if (code <= 0x7F) {
chars[0] = (code & 0x7F); chars[1] = '\0';
} else if (code <= 0x7FF) {
// one continuation byte
chars[1] = 0x80 | (code & 0x3F); code = (code >> 6);
chars[0] = 0xC0 | (code & 0x1F); chars[2] = '\0';
} else if (code <= 0xFFFF) {
// two continuation bytes
chars[2] = 0x80 | (code & 0x3F); code = (code >> 6);
chars[1] = 0x80 | (code & 0x3F); code = (code >> 6);
chars[0] = 0xE0 | (code & 0xF); chars[3] = '\0';
} else if (code <= 0x10FFFF) {
// three continuation bytes
chars[3] = 0x80 | (code & 0x3F); code = (code >> 6);
chars[2] = 0x80 | (code & 0x3F); code = (code >> 6);
chars[1] = 0x80 | (code & 0x3F); code = (code >> 6);
chars[0] = 0xF0 | (code & 0x7); chars[4] = '\0';
} else {
// unicode replacement character
chars[2] = 0xEF; chars[1] = 0xBF; chars[0] = 0xBD;
chars[3] = '\0';
}
}
So here's my question, does anyone know of a way to convert that UTF 8 encoding function to a UTF 16 one? I have done some research about both algorithms, and the truth is, I don't really understand either.
Alternatively I've seen people use the function MultiByteToWideChar but I couldn't get that to work either. Can anyone please provide me with a method or a function that would allow me to display the correct unicode characters on Windows, without the user having to change their console code page?
'cp437'- you can find this withsys.stdout.encoding.