how to set string character encoding in android

Question

HI! I have a web page content in encoded in ISO-8859-2. How to convert a stream encoded in this charset to java's UTF-8. I'm trying the code below, but it does not work. It messes up some characters. Is there some other way to do this?

    BufferedInputStream inp = new BufferedInputStream(in);
    byte[] buffer = new byte[8192];
    int len1 = 0;
    try{
        while ( (len1 = inp.read(buffer)) != -1 ) 
        {

            String buff = new String(buffer,0,len1,"ISO-8859-2");
            stranica.append(buff);
        }

You should re-tag this "Java" not "Android"

mtmurdock
– mtmurdock

2010-06-23 00:48:49 +00:00
Commented Jun 23, 2010 at 0:48 — mtmurdock
– mtmurdock, Commented Jun 23, 2010 at 0:48

king_nak · Accepted Answer · 2010-06-23 08:30:53Z

4

Try it with an InputStreamReader and Charset:

InputStreamReader inp = new InputStreamReader(in, Charset.forName("ISO-8859-2"));
BufferedReader rd = new BufferedReader(inp);
String l;
while ((l = rd.readLine()) != null) {
   ...
}

If you get an UnsupportedCharsetException, you know what's your problem... Also, with inp.getEncoding() you can check which encoding is really used.

answered Jun 23, 2010 at 8:30

king_nak

11.7k37 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Levara Over a year ago

it seems that the problem was that the encoding parameter should be "ISO8859-2" and not "ISO-8859-2"...

Alan Moore Over a year ago

I doubt that. ISO-8859-2 and ISO8859-2 are both valid names for that encoding, and Java recognizes both of them.

Manikandan Over a year ago

I have some Croatian text in an URL, and tried to download the contents but it show rectangle in some text. I posted my question at stackoverflow.com/questions/17574928/… can you help me.

Michael Borgwardt · Accepted Answer · 2010-06-23 08:36:36Z

3

How to convert a stream encoded in this charset to java's UTF-8

Wrong assumption: Java uses UTF-16 internally, not UTF-8.

But your code actually looks correct and should work. Are you absolutely sure the webpage is in fact encoded in ISO-8859-2? Maybe its encoding is declared incorrectly.

Or perhaps the real problem is not with the reading code that you've shown, but with whatever code you use to work with the result. How and where do these "messed up characters" manifest?

answered Jun 23, 2010 at 8:36

Michael Borgwardt

347k81 gold badges491 silver badges726 bronze badges

5 Comments

Levara Over a year ago

i know that about utf-16, but, when a web page has in it's head (or whatever it's called) utf-8 declared, everything works perfectly. when ISO-8859-2 is declared, certain Croatian characters like (Č,ć,š,ć,đ,ž) end up being displayed as ?.

Michael Borgwardt Over a year ago

@Levara: Do those webpages look correct when you open them in a browser? If that displays '?' too, then it looks as though the webpage contents were corrupted by whatever program produced them. Nothing you do at this point can fix that.

Levara Over a year ago

Yes. they are correctly displayed in browser. That's why I'm sure it's possible, I just don't know how to do it. :)

Michael Borgwardt Over a year ago

@Levara: then, as I wrote, the problem is with whatever you do with the data after you have read it. where are the characters displayed as '?'

Levara Over a year ago

I'm displaying it in textview in android. It works now, it seems that the problem was that the encoding parameter should be "ISO8859-2" and not "ISO-8859-2"... thanks anyway.

Collectives™ on Stack Overflow

how to set string character encoding in android

2 Answers 2

3 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related