3

I want to get the codepoint for each character in this string "عربى" so I write this code but it always output 63 which is the codepoint of the question mark character "?"

TCHAR   myString[50] = _T("عربى");
int stringLength=_tcslen(_T(myString));

for(int i=0;i<stringLength;i++)
{
   unsigned int number =myString[i];
   cout<<number<<endl;
}

any suggestions ? :)

3

2 Answers 2

3

Here's code that uses only the standard library and iterates the string by 32-bit wide code units. In the latest UTF-32 this matches up with code points.

using namespace std;
const auto str = u8"عربى";
wstring_convert<codecvt_utf8<char32_t>, char32_t> cv;
auto str32 = cv.from_bytes(str);
for(auto c : str32)
    cout << uint_least32_t(c) << '\n';

If your standard library hasn't implemented these features yet, you should probably use an external library.

Sign up to request clarification or add additional context in comments.

Comments

2

I copied your code, and by removing the _T(myString) cast into simply myString, it worked. Here is the full program.

#include <afxwin.h>

#include <iostream>

int main() {
    using namespace std;

    TCHAR   myString[50] = _T("عربى");
    int stringLength = _tcslen(myString); // <----- edit here

    for(int i=0;i<stringLength;i++)
    {
       unsigned int number =myString[i];
       cout<<number<<endl;
    }
}

Output:

1593
1585
1576
1609

3 Comments

This probably doesn't work with code points that consist of more than 2 code units in utf-8. Then again, OP might not need those scripts.
@Nasser : Thank you so much for your help :) user2079303 : Could you plz give an example as I don't understand what you mean and thank you for ur great help :)
@RehabReda: It is my understanding that TCHAR is 16 bits wide (if unicode is enabled). A 32 bit wide code point will be represented by 2 code units in UTF-16. This code still iterates the string by (16 bits wide) code unit rather than code point. For example this character: 𐌂 (isthisthingon.org/unicode/…), the code would print 55296 57090 instead of 66306. (at least I think so, the code doesn't compile in my compiler). I've added an answer that works with all current unicode but requires c++11.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.