4

I have a multilingual java application which gets and stores data in MySql Database.

I have kept table collation as utf-8-general-ci

For JDBC connection i use useUnicode=true&characterEncoding=UTF-8 parameters.

The characters like ® get displayed properly but chinese chars are messed up.

Now,

On Adding JVM argument -Dfile.encoding=UTF8

Chinese Chars are displayed but chars like ® are not.

What should i do to display all Chars that are in input from different languages.

Edit:

Input data comes from UDP packets which is processed by get methods on ByteBuffer.

and a getString Method implemented like this.

public String getString() {
    byte[] remainingBytes = new byte[this.byteBuffer.remaining()];
    this.byteBuffer.slice().get(remainingBytes);
    String dataString = new String(remainingBytes);
    int stringEnd = dataString.indexOf(0);

    if(stringEnd == -1) {
        return null;
    } else {
        dataString = dataString.substring(0, stringEnd);
        this.byteBuffer.position(this.byteBuffer.position() + dataString.getBytes().length + 1);

        return dataString;
    }
}
9
  • Where are you trying to display things? It's unclear whether the problem is actually database-related at all. Commented Aug 15, 2012 at 10:01
  • I am displaying it on a PHP webpage taking data from that mysql DB Commented Aug 15, 2012 at 10:05
  • Okay, so that's a whole other aspect where things could easily be wrong. What have you done to convince yourself that the problem is in the Java code rather than the PHP? Commented Aug 15, 2012 at 10:32
  • Because on using only useUnicode=true&characterEncoding=UTF-8 in JDBC it shows chars like ® but on adding -Dfile.encoding=UTF8 in JVM args it stops displaying properly. Commented Aug 15, 2012 at 10:34
  • And When i copy paste Chinese / Any char in DB it displays properly. So the problem is in JAVA -> DB path Commented Aug 15, 2012 at 10:40

1 Answer 1

1

you state when you try the character directly within MYSQL it works, only when java puts it there that its incorrect.

Tried getting your code to look for these characters and dumping them to a text file or out to std for a short test to compare the text std output vs what got sent to db ?

also worth storing the db transactions to see what was sent:

as far as mysql config goes ensure you have the tables and mysql itself running in utf-8 mode:

[client]
default-character-set=utf8

# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
default-character-set=utf8
default-collation=utf8_general_ci
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'

[mysqld]
default-character-set=utf8
default-collation=utf8_general_ci
character-set-server=utf8
collation-server=utf8_general_ci

Ensure above has been put into /etc/mysql/my.cnf for each DB name you have run below to get it to dump out tables and add an alter line to each table to convert to utf8

select CONCAT("Alter Table `",  i.TABLE_NAME, "` CONVERT TO CHARACTER SET utf8;") as MySQLCMD from information_schema.TABLES i where i.TABLE_SCHEMA =
"userbase" INTO OUTFILE '/tmp/userbase.csv' ;

Other things worth trying - specially if its to write in utf-8 on this server:

  1. Linux system environment:

    Unix Locale locale

    LANG=en_GB.UTF-8 LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL=

To fix this

 sudo dpkg-reconfigure locales    select en_GB.UTF-8
 update-locale LANG=en_GB.UTF-8

Re start box for services to pick up utf-8 as a user you will need to log out totally and back in and check locale before reboot to ensure its working.

This will now mean you can input japanese on your local ssh (if putty in the settings utf-8 needs to be selected)

  1. Tomcat: add URIEncoding="UTF-8" to

I also added to

 <Connector port="8009"......
           protocol="AJP/1.3"  URIEncoding="UTF-8" />

3.2 In the web.xml for local sites (within WEB-INF) web.xml (unsure if this is essential)

<web-app>
    <filter>
        <filter-name>charsetFilter</filter-name>
        <filter-class>filters.SetCharacterEncodingFilter</filter-class>
        <init-param>
            <param-name>encoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
    </filter>

then look for mapping and also add:

 <!-- Define filter mappings for the defined filters -->
<filter-mapping>
<filter-name>charsetFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>

I have come across specific character corruption issues worth opening up saving and viewing udp string in a good utf-8 editor (notepad++ with options to enable utf-8) or kate or something on kde.

also test out the different utf-8 characters the ones that do work and ones that potentially don't work via std out or file on

http://www.fileformat.info/info/unicode/char/search.htm

and ensure the characters are the same http://www.fileformat.info/info/unicode/char/00ae/index.htm

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.