2

I cannot find out how to do the conversion below

String s = "Här har du!  â\u0080\u0093 Hur väl kan du snacka?";
t = convert(s);
// t should be "Här har du! â Hur väl kan du snacka?"

I cannot find how to translate s into t. Anybody knows how to do this in Java?

6
  • 1
    Use UTF-8. Seriously—why does anyone not use unicode these days? Commented Dec 4, 2014 at 14:08
  • This is a strange one. The ä characters are obviously UTF-8 bytes coerced to characters, but the â is correct, and I have no idea what \u0080\u0093 are supposed to be, as they are not a valid UTF-8 byte sequence, and they wouldn't even make sense in the windows-1252 charset. In summary, this string doesn't seem to be derived from any one charset. Commented Dec 4, 2014 at 14:32
  • You're right that the string looks very strange... Commented Dec 4, 2014 at 14:38
  • After further research, it seems to be intended to be an EN dash-- see someone else's similar problem Commented Dec 4, 2014 at 14:46
  • this basically looks like an already corrupted string value. your problem lies before you got the String s. wile you may be able to patch things together after the fact, fixing your actual cause is the correct solution. where are you getting this string from in the first place? Commented Dec 4, 2014 at 15:34

2 Answers 2

3

Try sthg like this;

     String s = "Här har du!  â\u0080\u0093 Hur väl kan du snacka?";        
     byte[] bytes = s.getBytes("ISO-8859-1");
     String str  = new String(bytes, "UTF-8");

Output is ;

    Här har du!  – Hur väl kan du snacka?

For below code;

public static void main (String[] args) throws java.lang.Exception
{
     String s = "Här har du!  â\u0080\u0093 Hur väl kan du snacka?";        
     byte[] bytes = s.getBytes("ISO-8859-1");
     String str  = new String(bytes, "UTF-8");
     System.out.println(str);
}
Sign up to request clarification or add additional context in comments.

3 Comments

Your first two lines of code convert the string to bytes using UTF-8 and then back to a String using UTF-8, which means they are useless and can be removed. Your final line, new String(latin1), will use your platform's default charset, which is a very bad idea. It happened to work for you, but it's hardly reliable.
That looks correct, although it's better to use StandardCharsets.ISO_8859_1 and StandardCharsets.UTF_8 instead of String literals, both because Strings are subject to typos and because using standard charsets removes the need to catch an exception.
Thx very much! This answered my question. The code is executed on an app server. It works perfectly, but I'll see if I can set the default encoding in the app server configuration, because of your warning.
1

As i already mentioned in my comment, it looks like your String s is already corrupted. the correct solution is to fix wherever you got s from in the first place. it seems like you are interpreting what is really a "UTF-8" encoded String using some single byte encoding ("ISO8859-1" seems to work on your test string).

Provided you haven't already lost data in the original string corruption, you can somewhat patch your current string using:

    String s = "Här har du!  â\u0080\u0093 Hur väl kan du snacka?";        
    byte[] b = s.getBytes("ISO-8859-1");
    String t = new String(b, "UTF-8");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.