0

I am generating a UTF-8 XML using Spring Data REST. I am annotating the method which returns the XML as follows:

 @RequestMapping(value = "/Something/{id:.+}",
       method = RequestMethod.GET,
       produces = "application/xml")
public @ResponseBody String metsResource(@PathVariable String id){...}

My program generates an XML with some data from various APIs. I am noticing in some of the APIs, the data has a copyright symbol. When I create my XML and check it out, it is generated fine But the browsers(tried with Chrome & Safari) cannot render the XML. I get the following error. When I copied the XML output form the console, I could see the error position was right near the copyright symbol. I am not sure what went wrong in my XML, when the input contains a copyright symbol. Could anyone suggest a fix?

Browser Returned Error

--EDIT--

Here is a chunk of the XML. If you see inside the element accessCondition, you will notice a copyright symbol. This is exactly where the browser stops rendering.

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<data>
<hdr CREATEDATE="2014-07-21T12:40:09"/>
<sec ID="123456">
 <xmlData>
   <titleInfo>
    <title script="Latn">A book</title>
    <subTitle>Indian stories</subTitle>
   </titleInfo>
   <name>
    <namePart>Jane Doe</namePart>
    <role>Creator</role>
   </name>
   <originInfo>
    <publisher script="Latn"> ABCD Press</publisher>
    <place> Connecticut</place>
    <dateOther encoding="w3cdtf" keyDate="yes">2009</dateOther>
   </originInfo>
   <language>
    <languageTerm type="code">eng</languageTerm>
    <languageTerm type="text">English</languageTerm>
   </language>
   <abstract>A book with lot of Red Indian Stories.</abstract>
   <identifier type="hdl">123456</identifier>
   <location>
    <physicalLocation>N7433.4 L44 A88 2009</physicalLocation>
   </location>
   <accessCondition type="rightsOwnership">© 2009 Jane Doe - ABCD Press, Connecticut</accessCondition>
   <recordInfo>Test</recordInfo>
  </xmlData>
 </sec>
</data>   

The codebase which generates the complete XML is huge, so its hard to show here. But just before returning the XML, the program converts the ByteArrayOutputStream (variable 'out' in this case) into UTF-8

String xml = out.toString("UTF-8");

Like Jim Garrison suggested in the comments it seems like the © symbol in the input came in as ISO-8859-1 encoding. Reason: when I changed the above conversion of the ByteArrayOutputStream as following, the XML started to show.

String xml = out.toString("ISO-8859-1"); 

Is there any way to get the output as UTF-8? Thank a lot!

5
  • You told the system it was UTF-8 but then sent non-UTF-8 data. Without seeing the XML and the code that generated it nobody can help you. Commented Jul 21, 2014 at 19:38
  • I have added the XML chunk that is causing the error. Do you think you can help me now. Commented Jul 21, 2014 at 19:55
  • You will have to examine the raw output XML stream. If the copyright symbol takes up 1 byte (0xa9) then it's ISO-8859-1. If it is two bytes (0xc2 0xa9) then it's UTF-8. Commented Jul 21, 2014 at 20:27
  • @Jim Thanks for the suggestion. Is there a good way to examine the raw output XML stream? I tried to find out the bytes of a copyright symbol in mothereff.in/byte-counter site. It said 2 bytes. Commented Jul 21, 2014 at 20:46
  • @Jim, Like you said mentioned, seems like the © symbol in the input came in as ISO-8859-1 encoding. Because, when I changed the last conversion of the ByteArrayOutputStream as following, the XML started to show. String xml = out.toString("ISO-8859-1"); Is there any way to get the output as UTF-8 Commented Jul 21, 2014 at 21:03

1 Answer 1

2

Since I fixed the issue myself after a lot of hit and trials, I am posting this answer. Someone having the same problem as mine may be spared of hit and trials. First, I made sure the input data that I was receiving is UTF-8. Once that was confirmed, I tried to output the generated XML in console. That also returned UTF-8 data (at least the copyright did not come up as '?'). Only when I used curl to call the REST API or use the browser to render the output from a REST API, I got the incorrect encoding. I read through Spring Data Rest documentation and in an example, someone suggested that I specify the charset that I would like to return. Since XML is always UTF-8, it should not be necessary to mention the charset, but since I was not getting a proper UTF-8 format XML from the API, I specified the charset. This worked for me. Here is how to do it.

@RequestMapping(value = "/Something/{id:.+}",
   method = RequestMethod.GET,
   produces = "application/xml;charset=utf-8")
public @ResponseBody String metsResource(@PathVariable String id){...}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.