I am currently programming with c++ a program that handles both alphabets and Korean characters.
However I learned that the size of char in c++ is only 1 bytes. This meant that in order to handle foreign characters or UNICODE, it needs to use two chars for one character.
string s = string("a가b나c다");
cout<< s.length();
prints 9
but my question is how does the c++ execution distinguish between the two different type of characters?
for example if I make a char array the size of 9, How does it know whether its 9 ascii characters or 4 unicode + 1 ascii ??
and then i figured out this :
char c;
int a;
char* cp = "가나다라마바사아";
for (int i = 0; i < 20; i++) {
c = a = cp[i];
cout << "\n c val : " << c;
cout << "\n a val : " << a;
}
ONLY prints out negative values for a.
c val :
a val : -80
c val :
a val : -95
c val :
a val : -77
c val :
a val : -86
c val :
a val : -76
c val :
a val : -39
Which i can infer that for non ascii characters it only uses negative values? but isn't this quite a waste ?
My question in summary: Do c++ distinguish ascii chars and unicode chars only by looking if they are negative ?
Answer in summary : the parser decides whether to consider 1~4 char as 1 glyph by looking up the first few bits of the char, so to some extent my assumption was valid.