0

My web application (Java/Tomcat/Spring/Maven) is having trouble dealing with special characters like (hex 92, decimal 146). This comes into my app as another weird character.

I have looked at this question and verified that I I have the following line in all my JSP files:

<%@ page contentType="text/html; charset=UTF-8" %>

I also looked at this question and verified that I have the following line in my Maven pom.xml:

<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

So as far as I can tell everything should be built and handled in UTF-8. But when I submit the string Martin’s Auto Repair what shows up at the server during the Spring binding process is Martinâ\u0080\u0099s Auto Repair. This is the string that gets handed back by Tomcat to my application.

Worse, this is echoed back to the browser so submitting the altered string again expands the weird characters over and over.

Any suggestions? At this point I'm not sure if this is a browser problem or a server problem.

1
  • Do you handle requests as UTF-8 as well? And what do you call submit? As a query parameter? Commented Feb 26, 2014 at 18:22

1 Answer 1

2

Hex 92 is not a character in Unicode (http://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF)

Windows codepage 1252 is not 100% identical to Unicode.

Sign up to request clarification or add additional context in comments.

5 Comments

additionally it looks like you somewhere parse an UTF-8 encoded byte stream as ISO-Latin-1 or you would not see the interesting sequence in the output.
Thank you for your response. Assuming your analysis is correct, what is the solution for this problem? Is it a Tomcat configuration issue? A browser issue?
And is this a general problem with my setup, or did I happen to pick one particular example (hex 92) that does not work?
This definitely appears to be the problem, so I'm going to close this question and open another one: "How do I detect invalid UTF-8 strings?"
File encodings are different between platforms. I would suggest that you use ASCII as the source encoding only - and use \u0000 notation for those characters outside that range. This will ensure that your sources are platform independent.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.