I'm trying to convert the string "pokémon" from std::string to std::wstring using
std::wstring wsTmp(str.begin(), str.end());
This works on Windows, but on Linux it returns "pok\xffffffc3\xffffffa9mon"
How can I make it work on Linux?
This worked for me on POSIX.
#include <codecvt>
#include <string>
#include <locale>
int main() {
std::string a = "pokémon";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> cv;
std::wstring wide = cv.from_bytes(a);
return 0;
}
The wstring holds the correct string at the end.
Important note by @NathanOliver: std::codecvt_utf8_utf16 was deprecated in C++17 and may be removed from the standard in a future version.
std::codecvt_utf8_utf16 was deprecated in C++17 and may be removed from the standard in a future version..cpp file is saved as UTF-8, and the compiler parses the file as UTF-8. Consider using the u8 prefix on the string literal to force it to UTF-8, even if the file is not using UTF-8, eg: std::string a = u8"pokémon"; But whatever charset the .cpp is actually encoded in, make sure the compile is setup to interpret the file in that same charset.The problem you seem to be running into here is that it's treating the é's two code units as separate code points when converting. There's no good way to do this with the standard library past C++17, as std::wstring_convert was deprecated without being given a proper replacement. You have several options, none of them great:
std::wstring_convert and ignore the deprecation warnings and the fact that it may be removed in a future revision of C++.Also somewhat unrelated, but if you care about consistency across different platforms you should be using std::u16string or std::u32string. std::wstring's character size depends on the size of wchar_t, which varies between different compilers and platforms.
charas-is towchar_t, extending the value from 8bits to 16bits on Windows or 32bits on Posix. There is no encoding conversion performed. What is the actual encoding of thestd::string? ANSI (system locale)? UTF-8? It makes a BIG difference in how the data needs to be converted tostd::wstringproperly.