1

The problem with unsigned char. I am reading a PPM image file which has data in ASCII/Extended ASCII.

For a character, eg. '†' , In JAVA, after reading it as char and typecasting into int its value is 8224. In C/C++, after reading it as a unsigned char and typecasting into int its value is 160.

How would i read in JAVA so as to get value 160 ?

The followng C++

unsigned char ch1 ='†';  
char ch2 = '†';  

cout << (int) ch1 << "\n"; // prints 160  
cout << (int) ch2 << "\n"; // prints -96  

In Java,

char ch1 = '^';  
char ch2 = '†';  
System.out.println (" value : " +  (int) ch1); // prints 94  
System.out.println (" value :" +  (byte) ch1); // prints 94  

System.out.println (" value : " +  (int) ch2); // prints 8224  
System.out.println (" value :" +  (byte) ch2); // prints 32 

Following are some exceptions 8224 † 8226 • 8800 ≠ 8482 ™ 8710 ∆ 8211 – 8221 ” 8216 ‘ 9674 ◊ 8260 ⁄ 8249 ‹ 8249 ‹ 8734 ∞ 8747 ∫ 8364 € 8730 √ 8804 ≤

Following are some good ones 94 ^ 102 f 112 p 119 w 126 ~ 196 Ä 122 z 197 Å 197 Å

Any help is appreciated

3 Answers 3

4

In C++ you are using "narrow" characters in some specific encoding that happens to define character '†' as 160. In other encodings 160 may mean something else, and character '†' may be missing altogether.

In Java, you are always dealing with Unicode. 8660 = 0x2020 = U+2020 "DAGGER".

To get "160", you need to convert your string to the same encoding you are using with C++. See String.getBytes(charset).

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks atzz, that is great explanation. I'm now trying to get what charset is being used in C++. Thank you ! :)
@ravikumar1: Try US-ASCII. If that doesn't work, try ISO-8859-1.
Thank you Bemrose. I wrote a small fn to get the charset. I found a hit for -96 (256-96=160). Thank you all for the support. :) Below is my test fn:
Here it is . public void findCharsets() { Map charSets = Charset.availableCharsets(); Iterator it = charSets.keySet().iterator(); String str = Character.toString('†'); while (it.hasNext()) { try { String csName = (String) it.next(); byte b[] = str.getBytes(Charset.forName(csName)); if (b[0] == -96) { System.out.println("Found: " + csName); } } catch (Exception e) { // do nothing; go to next Charset } } }
This is the output of the program Found: MacRoman Found: x-MacCentralEurope Found: x-MacCroatian Found: x-MacCyrillic Found: x-MacGreek Found: x-MacRomania Found: x-MacTurkish Found: x-MacUkraine
0

IIRC Java uses a 16-bit representation for chars (UNICODE?) and C++ normally doesn't unless you use wchars.

I think you'd be better off trying to get C++ to use the UNICODE characters that Java uses rather than the other way around.

2 Comments

Hi Timo, Thank you for the prompt reply. I'm trying to write my app in JAVA. So I need a way to get 160 out of the char † . :(
"UNICODE?" UTF-16 to be more precise.
0

If you write out the unsigned char 160 in C++ as a single byte, and use InputStream.read() you will get 160. Which character this means depends on the assumed encoding but the value 160 is unchanged.

1 Comment

Thanks Peter, I'm trying to write in JAVA only. I dont have a program in C++ which runs first. Simply, I'm decoding in JAVA only, for which I need 160 for char †

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.