0

I have an application that pulls names from the active directory for the domain using ajax calls. Some names have Spanish characters (n tilde for example). I used the utf-8 character set to get the characters to correctly show the data on the form. I can successfully pull the names from the ajax call and load them into the form field. The problem I have is that when the form is posted to the server for the database update, the String cast corrupts the extended characters.

Is there a special String function to handle utf-8? What is the proper method to get the correct values posted to the Oracle tables?

I have done quite a bit of Java coding, but this is my first encounter with the extended characters. Any help will be appreciated.

Thanks.

8
  • Is there a content-type header on the request that results from the form submission? On the response that contains the form? Commented Feb 10, 2012 at 23:07
  • Java Strings are UTF-16. I don't know what you mean by "the String cast". If you are explicitly converting a UTF-8 byte[] to a String, you could use new String(byte array, Charset). Commented Feb 10, 2012 at 23:10
  • 1
    Java String's are unicode. The JVM uses UTF-16 internally but it's not really correct to say that Java Strings are UTF-16. The default charset depends on the platform. Commented Feb 11, 2012 at 0:00
  • 1
    @tchrist The fact that the JVM uses UTF-16 has nothing to do with the question and it is untrue to say that it leads to "innumerable" coding errors. I have worked with multiple charsets in Java for years and I have seen lots of other code doing the same and I have never once seen that be a problem. What I have seen repeatedly is people unaware that default readers use the platform default charset and thus mangling the character as they are read in. new String(byte[]) does the same thing (uses platform default) and that has nothing to do with what the JVM is using internally. Commented Feb 12, 2012 at 16:14
  • 1
    @cotton.m: programmers.stackexchange.com/q/102205/3601 Commented Feb 18, 2012 at 10:08

2 Answers 2

4

Where is this "cast" coming into play?

I am not sure what your application is but there a couple of places where you could be mangling the characters. First, assuming this is some sort of Java EE app make sure that you have set the request encoding in the servlet. See the setCharacterEncoding method of HttpServletRequest. You should use "UTF-8" there.

Second, you should make sure that you have the accept-charset="UTF-8" attribute set on the form variable. (Note - in my experience this rarely is a problem if the page is UTF-8 encoded to begin with but better safe than sorry).

Last make sure that you have specified any encoding options if neccessary for the connection to the database. I don't use Oracle so I don't know but often you'll need to specify to use "unicode" or "utf-8" or the like somewhere where you create the connection.

I would try them in order because it's possible (likely) the first itself might fix the problem.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks cotton.m. The response is sent back to a class that loads all of the data to String[] arrays in order to process each row individually. The columns are VarChar in the oracle table. They will accept the unicode characters as I can copy and paste the values directly through a table editor. I have only been coding Java for a couple years. I assumed I needed to cast the raw data from the form to the String type to load it into the database. Should I use a different type function?
This leaves much unanswered. How is the response sent? How is the response received? What "loads" all the data to String[] arrays? I spoke about ensuring you set any flags on the connection to the database not does the database support unicode. In short, you need to check what you have is valid at each stage and identify where exactly it's going awry. It's very unclear to me at this stage what you are doing exactly. I really think you need to post your code.
I would be happy to send my code. Should I just paste it into a comment box or is there another facility to upload it as a zip?
It seems the simplest solution is the right one. I added the statement req.setCharacterEncoding("UTF-8"); to the beginning of my maintenance class and that solved the problem. Just to be sure, I displayed the value before and after. It was null before I set it. My only question now is should I set it for all requests or just where I know the names will be pulled from active directory and contain the unicode characters? Will changing the encoding value affect any other string handling?
You really should be setting the encoding any place where you will be reading in string data. Changing the encoding will only be a problem if you had some place where you were receiving data that was not UTF-8 (this is rare for multiple reasons and unlikely to affect you).
1

You want an OutputStreamWriter. When you construct it, specify that you want to use the "UTF-8" charset. Also make sure you specify that you're sending UTF-8 in your http headers.

4 Comments

Thanks Bill. Can you elaborate on this? The form consists of multiple rows of fields. One row for each row in the database table. In the java class that processes the form data, I load each column into String arrays: i.e., String[] linenums = req.getParameterValues("linenum"); I think the "String" type cast is what is causing the corruption. Can you provide some details on how I can replace the array for the names with a type that will not corrupt the values?
Well, casting would not corrupt the contents of a string, and the code snippet you gave String[] linenums = req.getParameterValues("linenum"); does not contain a cast, so its not really clear how to answer your question.
Does decalring the array as String[] cause the unicode characters to get corrupted or is it the getParameterValues method. Is there another way to retrieve the values from the form fields?
@JerryWelliver You are on the wrong track here. Declaring an array or casting a String (or String array) is not going to mangle any character encoding (namely because there isn't any going on there). getParameterValues implies you are using a servlet and I wonder if you have done as I told you previously and set the character encoding for the request properly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.