2

I have a Java class that upload a text file from a Windows client to a Linux server.

The file I am triyng to upload is encoded using Cp1252 or ISO-8859-1.

When the file is uploaded, it become encoded using utf-8, then strings containing accents like éèà can't be read.

The command

file -i *

in the linux server tells me that it's encoded using utf-8.

I think the encoding was changed diring the upload, so I added this code to my servlet:

String currentEncoding=System.getProperty("file.encoding");
System.setProperty("file.encoding", "Cp1252");
item.write(file);
System.setProperty("file.encoding", currentEncoding);

In the jsp file, I have this code:

<form name="formUpload"
action="..." method="post"
enctype="multipart/form-data" accept-charset="ISO-8859-1">

The lib I use to upload a file is apache commun.

Doe's any one have a clue, cause I'm really runnig out of ideas!

Thanks,

Otmane MALIH

2
  • Keep in mind that the servlet code you provided may introduce strange side effects in real-world scenarions... remember concurrent clients (system properties are global). Commented Sep 14, 2012 at 14:04
  • I know, I take away that code, I was just trying to force the file to be encoded in ISO-8859-1 but that didnt work. Commented Sep 14, 2012 at 14:06

1 Answer 1

2

Setting the system property file.encoding will only work when you start Java. Instead, you will have to open the file with this code:

public static BufferedWriter createWriter( File file, Charset charset ) throws IOException {
    FileOutputStream stream = new FileOutputStream( file );
    return new BufferedWriter( new OutputStreamWriter( stream, charset ) );
}

Use Charset.forName("iso8859-1") as charset parameter.

[EDIT] Your problem is most likely the file command. MacOS is the only OS in the world which can tell you the encoding of a file with confidence. Windows and Linux have to make a guess. This guess can be wrong.

So what you need to do is to open the file with an editor where you specify the encoding. You need to do that on Windows (to make sure that the file really was saved with Cp1252; some applications ignore the platform and always safe their data in UTF-8).

And you need to do the same on Linux. If you just open the file, the editor will take the platform encoding (which is UTF-8 on modern Linux systems) and try to read the file with that -> ISO-8859-1 umlauts will be garbled. But if you open the file with ISO-8859-1, then UTF-8 will be garbled. That's the only way to be sure what the encoding of a text file really is.

Sign up to request clarification or add additional context in comments.

7 Comments

I tried that too, but the FileItem item.write(File file) doesn't take a BufferedWriter as argument.
You need to fix the code inside of item.write(). There is no way to fix this error outside of it.
It's part of Commons FileUpload API. This can't be "just" fixed :) Ignoring that, I don't think that the OP is looking in the right direction for the solution. Commons FileUpload definitely doesn't decode/encode files during FileItem#write(). It just streams the from the network retrieved bytes unmodified. The problem is caused in either the client side, or the server side after the file has been written.
@OtmaneMALIH: Please check the HTTP header of the request. What charset is specified there? You can do that by looking at the header methods of the HttpServletRequest instance.
I did change the server configuration (.profiles) I put LC_ALL=en_US instead of fr_FR and it resolved my problem, so as BalusC said, it's caused by the server.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.