I'd like to download the sources of many webpages, then write to the file and print it out in the NetBeans console. I have a problem with encoding. First check my code out:
public static final void foo(URL url, Charset endoding, String file) {
BufferedReader in;
String readLine;
try
{
in = new BufferedReader(new InputStreamReader(url.openStream(), encoding));
BufferedWriter out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(file) , encoding));
while ((readLine = in.readLine()) != null) {
System.out.println(readLine+"\n");
out.write(readLine+"\n");
}
out.flush();
out.close();
}
}
I am testing this on 2 foreign websites (ex. Czech and Thai)
I tried Charset.forName("UTF-8") that seems to work correctly for the Thai webpage but actually for the Czech one doesn't. Console and file contains the question mark such as �.
I have also tried ISO-8859-2, that saves the file correctly, but the console shows small rectangle instead of letters ž, š etc..
Does exist any universal solution for multilanguage websites (as Czech, Japan, Thai and more..), that I can save to file correctly as same as print to console or save to variable?