3

i am using an HTML parser called HTMLCLEANER to parse HTML page the problem is that each page has a different encoding than the other. my question

Can i change from any character encoding to UTF-8?

4 Answers 4

3

You cannot seamlessly "convert" from encoding X to encoding Y without knowing encoding X beforehand. Just check the HTTP response header which encoding it is using (if you're obtaining those HTML pages by HTTP) and then use the appropriate encoding in your HTML parser tool.

Sign up to request clarification or add additional context in comments.

Comments

1

Where do you get the HTML page from? If you get it from the servlet request, you can use getReader() on it and pass that to clean(). This will use the right encoding. If you get it from an upload, pass the input stream to clean(). If you get it by http client, you need to check the reponse header Content-Type using getResponseCharSet().

Comments

1

Can i change from any character encoding to UTF-8?

Yes, you can express any Unicode character in UTF-8 encoding.

There might be a problem when changing the encoding of HTML pages: if the page contains an "charset" Meta-Tag, for example,

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

you have to update this tag so it corresponds to the actual encoding.

1 Comment

He's parsing a HTML page, not generating a HTML page. Besides, this line actually instructs the client side which encoding to use to parse the given HTML page (and that is exactly the information the OP doesn't know about beforehand and thus need to find out in the response headers!).
0
public void arreglarString(String cadena) {
    for (int i = 161; i < 256; i++) {
        char car =  (char) i;
        cadena = cadena.replaceAll(car + "", "&#" + i); 
    }

    return cadena;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.