0

As per my knowledge i know unicode character means every letter has an unique code.

In my database i have set utl8.

Here, i am saving a string (ఉత్తరప్రదేశ్) directly into the database in java.Then it is saved as

ఉత్తరప
్రదేశ్ 

But the same string i saved in database using

escapeUnicode(StringEscapeUtils.unescapeHtml("here string"));


public String escapeUnicode(String input) {
   StringBuilder b = new StringBuilder(input.length());
   Formatter f = new Formatter(b);
    for (char c : input.toCharArray()) {
      if (c < 128) {
        b.append(c);
      } else {
        f.format("\\u%04x", (int) c);
      }
     }
   return b.toString();
}

It is generating unicode as

\u0c09\u0c24\u0c4d\u0c24\u0c30\u0c2a\u0c4d\u0c30\u0c26\u0c47\u0c36\u0c4d

Both are displaying in browser correctly.Why they both are generating different unicodes ? Thanks in advance..

1 Answer 1

3

Those are not different numbers…

  • 3081 = 0c09 = ఉ = TELUGU LETTER U
  • 3108 = 0c24 = త = TELUGU LETTER TA
  • 3149 = 0c4d = ్ = TELUGU SIGN VIRAMA

… and so on.

Two different ways to represent the same Unicode code point.

The first are decimal numbers (base 10). The second are hexadecimal numbers (base 16).

When using a class such as Formatter, sometimes it helps to read the documentation. Then you might understand why you pasted f.format("\\u%04x" into your code.

Tip: If you have a Mac, download the UnicodeChecker app to see both decimal and hex numbers for each character defined in Unicode.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.