1

My high level goal is to convert any string (can include non-ascii characters) into a vector of integers by converting each character to integer.

I already have a python code snippet for this purpose:

bytes = list(text.encode())

Now I want to have a C++ equivalent. I tried something like

int main() {
  char const* bytes = inputText.c_str();
  long bytesLen = strlen(bytes);
  auto vec = std::vector<long>(bytes, bytes + bytesLen);
  for (auto number : vec) {
      cout << number << endl;
  }
  return 0;
}

For an input string like "testΔ", the python code outputs [116, 101, 115, 116, 206, 148].

However C++ code outputs [116, 101, 115, 116, -50, -108].

How should I change the C++ code to make them consistent?

6
  • "I know how to do this in another language" is not a reason to use that language's tag. Commented Nov 10, 2020 at 19:58
  • If you are using unicode characters you cannot use char* you'll likely want to use wide characters and unicode literals stackoverflow.com/questions/6796157/… for example Python's str.encode uses utf-8 by default, so here's an explanation of utf-8 support in C++ stackoverflow.com/questions/50403342/… Commented Nov 10, 2020 at 19:59
  • If you're sticking in ASCII space, an unsigned datatype in the vector should help a lot. Commented Nov 10, 2020 at 20:01
  • @user4581301 Δ is not in ASCII space. Commented Nov 10, 2020 at 20:05
  • True enough. I was watching TV the other night and apparently the aliens built the pyramids. Maybe they are responsible for triangles, too. Commented Nov 10, 2020 at 20:54

3 Answers 3

2

However C++ code outputs [116, 101, 115, 116, -50, -108].

In C++, the char type is separate from both signed char and unsigned char, and it is unspecified whether or not it should be signed.

You thus explicitly want an unsigned char*, but the .c_str method gives you char *, so you need to cast. You will need reinterpret_cast or a C-style cast; static_cast will not work.

Sign up to request clarification or add additional context in comments.

2 Comments

To be clear: static_cast will work on individual elements, but not on the pointer.
thanks Karl. I modified my code and confirmed it's working. Really appreciate your help (as well as the other comments below)
1

You can iterate over std::string contents just fine, no need to convert it to std::vector. Try this:

int main()
{
    std::string str = "abc";
    for (auto c : str)
    {
        std::cout << static_cast<unsigned int>(c) << std::endl;
    }
}

static_cast here is needed just because standard operator<< outputs char as it is, not as a number. Otherwise, you can work with it just like with any other integral type. We cast it to unsigned int to ensure that output is strictly positive, for signedness of char is implementation-defined.

2 Comments

thanks for the suggestion! My use case actually needs the vecotr<long> explicitly. I added the cout just to help explain my problem
@ireneisgood you can construct it like this: std::vector<unsigned long> result(str.begin(), str.end());
0

How should I change the C++ code to make them consistent?

The difference appears to be that Python uses unsigned char values while char is signed in your C++ implementation. One solution: Reinterpret the string as array of unsigned char.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.