2

I am having this encoding issue in java, one string I actually need to handle is the response from running "systeminfo" command under windows commandline, and I need to present the result in a html document. The problem is if I run my application on French operating system, the garbled characters are shown in the html, no matter how I tried to convert the encodeing settings.

From the log, I can see the system encoding is "Cp1252", code snippet is as follows:

String systemEncoding = System.getProperty("sun.jnu.encoding");
log.info("sun.jnu.encoding="+systemEncoding);

In html builder class, I did something like this:

for(String line : lines){
    line = new String(line.getBytes("Cp1252"), "UTF8");
    osReport.append(line + "<br>");
}

Unfortunately, I still can see those garbled "question marks" all around, which are supposed to be some French characters.. The html header looks like this btw

<HEAD>
<META content="text/html; charset=UTF-8" http-equiv=Content-Type>
</HEAD>

How to get the response string, see the following piece of code please..

try{
    String systemEncoding = System.getProperty("sun.jnu.encoding");
    log.info("sun.jnu.encoding="+systemEncoding);
    InputStreamReader isr;
    if (StringUtil.isEmpty(systemEncoding)) {
        isr = new InputStreamReader(is);
    } else {
        isr = new InputStreamReader(is, systemEncoding);
    }
    BufferedReader br = new BufferedReader(isr);
    String line=null;
    while ((line = br.readLine()) != null) {
        res.append(line);
        res.append(LINE_SEP);
    }   
 } catch (IOException ioe) {
    log.error("IOException occurred while printing the response",ioe);
 }

Any help?? Thanks so very much!

7
  • What is the output in system.out of line you convert to utf-8? Commented Aug 21, 2011 at 7:30
  • 3
    How are you loading the lines in the first place? That's the bit of code you need to fix. By the time you've got invalid data in your strings, you're in a really nasty situation. Commented Aug 21, 2011 at 7:31
  • 3
    Change your encoding in line line = new String(line.getBytes("Cp1252"), "UTF8"); from UTF8 to UTF-8. Perhaps that's the problem? Commented Aug 21, 2011 at 7:37
  • Hi D1e, you are guessing it right, gibberish stuff is seen in system.out too... Commented Aug 21, 2011 at 7:52
  • 1
    The short answer is that what ever encoding systeminfo is outputing in it isn't the encoding you are reading it in with. Commented Aug 21, 2011 at 8:50

2 Answers 2

4

I am assuming you are invoking the command via the Process type. I would expect systeminfo.exe to write output using the default ANSI encoding (windows-1252 on a French system.)

That means that you can use the default encoding to read the input (the one used by the InputStreamReader(InputStream) constructor.) This will transcode the input from the default encoding to UTF-16. This code uses the Scanner type with the default system encoding:

Process process = new ProcessBuilder(command).redirectErrorStream(true)
    .start();
InputStream in = process.getInputStream();
try {
  Scanner scanner = new Scanner(in);
  while (scanner.hasNextLine()) {
    lines.add(scanner.nextLine());
  }
  if (process.exitValue() != 0 || scanner.ioException() != null) {
    // throw exceptions
  }
} finally {
  in.close();
}

Java strings are always UTF-16, so code like this is just a transcoding bug:

new String(line.getBytes("Cp1252"), "UTF8");

Ensure that you are encoding your HTML file correctly.

Charset utf8 = Charset.forName("UTF-8");
OutputStream out = new FileOutputStream(file);
Closeable stream = out;
try {
  Writer writer = new OutputStreamWriter(out, utf8);
  stream = writer;
  // write to writer here
} finally {
  stream.close();
}

I would not try to read or directly change system properties like sun.jnu.encoding or file.encoding - these are JVM implementation details and their direct use or configuration is not supported.

If you are relying on System.out to verify characters, ensure the device consuming the output decodes its input as windows-1252. See here for more on encoding.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi Mc, could you be more specific on the correction I need to do here? Am I supposed to change the encoding setting to UTF16 or what, to get those french "this and that" look normal..
@Even - you don't have to specify that strings are UTF-16 - they always are. You just have to be aware that anything that converts a byte sequence to a char sequence implicitly performs a transcoding operation to UTF-16. The act of using an InputStreamReader transforms the data being read.
0

Without defining the used character encoding, you can't display those French characters in html using the plain character code point. In other words, this doesn't work:

<html>
<body>
accent égu et ce çedille :D
</body>
</html>

This results in:

accent égu et ce çedille :D

So, you have to define the encoding in the meta headers OR replace all the French characters by their escape equivalent. Full list here.


And about the trick with the system character encoding: I don't think that what the sun.jnu.encoding says, is the same encoding that systeminfo.exe uses to output.

8 Comments

It is perfectly possible to include non-ASCII characters directly in HTML source, as long as you correctly declare your encoding in your HTPP headers or a <meta> tag. Resorting to character escapes, while potentially helpful, is unnecessary.
@Stuart Cook: Wow! I learned something new. Very interesting! Thanks!
Hi Martijn, just so you know, my application is supposed to run on all localized platforms, including French, Japanese, Chinese, etc, there got to be a way to pull this off rather than tons of escape equivalent stuff, right?
@Even: Take a look to Stuart's comment
The problem is I don't know how to "correctly declare...", bascailly no luck in all approaches tried so far
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.