Spring/Tomcat not honoring UTF-8 encoding?

Question

My web application (Java/Tomcat/Spring/Maven) is having trouble dealing with special characters like ’ (hex 92, decimal 146). This comes into my app as another weird character.

I have looked at this question and verified that I I have the following line in all my JSP files:

<%@ page contentType="text/html; charset=UTF-8" %>

I also looked at this question and verified that I have the following line in my Maven pom.xml:

<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

So as far as I can tell everything should be built and handled in UTF-8. But when I submit the string Martin’s Auto Repair what shows up at the server during the Spring binding process is Martinâ\u0080\u0099s Auto Repair. This is the string that gets handed back by Tomcat to my application.

Worse, this is echoed back to the browser so submitting the altered string again expands the weird characters over and over.

Any suggestions? At this point I'm not sure if this is a browser problem or a server problem.

Do you handle requests as UTF-8 as well? And what do you call submit? As a query parameter? — fge
– fge, Commented Feb 26, 2014 at 18:22

Thorbjørn Ravn Andersen · Accepted Answer · 2014-02-26 18:24:59Z

2

Hex 92 is not a character in Unicode (http://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF)

Windows codepage 1252 is not 100% identical to Unicode.

answered Feb 26, 2014 at 18:24

Thorbjørn Ravn Andersen

75.4k35 gold badges203 silver badges359 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Thorbjørn Ravn Andersen Over a year ago

additionally it looks like you somewhere parse an UTF-8 encoded byte stream as ISO-Latin-1 or you would not see the interesting sequence in the output.

user3120173 Over a year ago

Thank you for your response. Assuming your analysis is correct, what is the solution for this problem? Is it a Tomcat configuration issue? A browser issue?

user3120173 Over a year ago

And is this a general problem with my setup, or did I happen to pick one particular example (hex 92) that does not work?

user3120173 Over a year ago

This definitely appears to be the problem, so I'm going to close this question and open another one: "How do I detect invalid UTF-8 strings?"

Thorbjørn Ravn Andersen Over a year ago

File encodings are different between platforms. I would suggest that you use ASCII as the source encoding only - and use \u0000 notation for those characters outside that range. This will ensure that your sources are platform independent.

Collectives™ on Stack Overflow

Spring/Tomcat not honoring UTF-8 encoding?

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related