1

I have been trying to convert a UTF-8 String to its relative in ISO-8859-1 for outputting it in an XML document, and no matter what I try, the output is always wrongly displayed.

For simplifying the question, I created a code snippet with all the tests I did, and I copy/paste after that the generated document.

You can also be sure I tried all the combination possible between new String(xxx.getBytes("UTF-8"), "ISO-8859-1"), by switching UTF & ISO, and sometimes also by setting the same value. Nothing works !

Here's the snippet :

// @see http://stackoverflow.com/questions/229015/encoding-conversion-in-java
private static String changeEncoding(String input) throws Exception {
    // Create the encoder and decoder for ISO-8859-1
    Charset charset = Charset.forName("ISO-8859-1");
    CharsetDecoder decoder = charset.newDecoder();
    CharsetEncoder encoder = charset.newEncoder();

    // Convert a string to ISO-LATIN-1 bytes in a ByteBuffer
    // The new ByteBuffer is ready to be read.
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(input));

    // Convert ISO-LATIN-1 bytes in a ByteBuffer to a character ByteBuffer and then to a string.
    // The new ByteBuffer is ready to be read.
    CharBuffer cbuf = decoder.decode(bbuf);
    return cbuf.toString();
}

// @see http://stackoverflow.com/questions/655891/converting-utf-8-to-iso-8859-1-in-java-how-to-keep-it-as-single-byte
private static String byteEncoding(String input) throws Exception {
    Charset utf8charset = Charset.forName("UTF-8");
    Charset iso88591charset = Charset.forName("ISO-8859-1");

    ByteBuffer inputBuffer = ByteBuffer.wrap(input.getBytes());

    // decode UTF-8
    CharBuffer data = utf8charset.decode(inputBuffer);

    // encode ISO-8559-1
    ByteBuffer outputBuffer = iso88591charset.encode(data);
    byte[] outputData = outputBuffer.array();
    return new String(outputData, "ISO-8859-1");
}

public static Result home() throws Exception {
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

    //root elements
    Document doc = docBuilder.newDocument();
    doc.setXmlVersion("1.0");
    doc.setXmlStandalone(true);

    Element rootElement = doc.createElement("test");
    doc.appendChild(rootElement);

    rootElement.setAttribute("original", "héllo");

    rootElement.setAttribute("stringToString", new String("héllo".getBytes("UTF-8"), "ISO-8859-1"));

    rootElement.setAttribute("stringToBytes", changeEncoding("héllo"));

    rootElement.setAttribute("stringToBytes2", byteEncoding("héllo"));

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer transformer = tf.newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

    StringWriter writer = new StringWriter();
    transformer.transform(new DOMSource(doc), new StreamResult(writer));
    String output = writer.getBuffer().toString().replaceAll("\n|\r", "");

    // The following is Play!Framework specifics for rendering an url, but I believe this is not the problem (I checked in the developer console, the document is correctly in "ISO-8859-1"
    response().setHeader("Content-Type", "text/xml; charset=ISO-8859-1");
    return ok(output).as("text/xml");
}

And the result :

<?xml version="1.0" encoding="ISO-8859-1"?>
<test original="héllo" stringToBytes="héllo" stringToBytes2="héllo" stringToString="héllo"/>

How can I proceed?

6
  • I think you mispelled response. If your talking about response() from Play!Frameowork, there is no setCharacterEncoding() (I'm using Play! 2.1.5). There is also no setCharacterEncoding() in doc` (Document) Commented Feb 23, 2014 at 11:00
  • Thanks for your help. I already set the encoding to "ISO-8859-1" by calling setHeader. There is no encoding in Play v2.1.5 (but there is a CONTENT_ENCODING, which is final) Commented Feb 23, 2014 at 11:20
  • Sorry again. I read 1.2.5 and not 2.1.5. Commented Feb 23, 2014 at 11:23
  • No problems ;) I also tried to only use encoding in http response without modifying the strings, but it didn't worked. It only works if I also remove the encoding in the xml document, but it's because the page is then displayed as UTF-8, which I don't want. Commented Feb 23, 2014 at 11:24
  • You were kind of right. I finally switched the output from StringWriter to writing into a file, and then outputting directly this file as binary, and now everythings works fine, with the right encoding. No switching of encoding were done! You can add your comments as an anwser, I'll accept it :) Commented Feb 23, 2014 at 15:24

1 Answer 1

2

For a reason I can't explain, by writing to a file and returning this file to the output fixed the problem of encoding.

I decided to keep this question in case other people had a similar problem.

Here's the snippet :

TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

File file = new File("Path/to/file.xml");
transformer.transform(new DOMSource(doc), new StreamResult(file));

response().setHeader("Content-Disposition", "attachment;filename=" + file.getName());
response().setHeader("Content-Type", "text/xml; charset=ISO-8859-1");
return ok(file).as("text/xml");
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.