0

I have a signed string like this: "Làm sao để chuyển chuổi có dấu về không dấu?"

And I want to translate it to string like this: "Lam sao de chuyen chuoi co dau ve khong dau?"

Please tell me the way to solve it in Java code. Thanks a lot!

1 Answer 1

1

Something like

public static void main(String args[]) {
    String src = "Làm sao để chuyển chuổi có dấu về không dấu?";

    String dest = Normalizer.normalize(src, Normalizer.Form.NFD);
    dest = dest.replaceAll("[^\\p{ASCII}]", "");

    System.out.println(src);
    System.out.println(dest);
}

gives you

Làm sao để chuyển chuổi có dấu về không dấu?

Lam sao e chuyen chuoi co dau ve khong dau?

Sign up to request clarification or add additional context in comments.

5 Comments

I just noticed that this doesn't quite give you what you require: để has been truncated to e.
It looks like the problem isn't that simple: there's a similar question here stackoverflow.com/questions/2362810/…
Yeah, that's not simple. But thanks a lot Jonathan, I'll follow your suggestion. :)
Would you please explain to me the meaning of this characters "[^\\p{ASCII}]" ?
replaceAll takes a regular expression, you can see details here: docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/… -- here is means to match anything that's not an ASCII character. The call to Normalizer.normalize decomposes characters with accents in to separate ASCII-character + Unicode combining accent-character, so when we filter out non-ASCII we have almost the string we want. Except đ doesn't seem to have an alternative ASCII+combining accent representation, so we end up removing it completely.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.