0

Possible Duplicate:
How can I detect the encoding/codepage of a text file

I have a ASP.NET MVC application. In my view I upload a text file and process it with a controller method with this signature

[HttpPost]
public ActionResult FromCSV(HttpPostedFileBase file, string platform)

I get a stream from the uploaded file as file.InputStream and read it using a standard StreamReader

using (var sr = new StreamReader(file.InputStream))
{
    ...
}

The problem is, that this only works for UTF text files. When I have a text file in Windows-1250, the characters get messed up. I can work with Windows-1250 encoded text files when I explicitly specify the encoding

using (var sr = new StreamReader(file.InputStream, Encoding.GetEncoding(1250)))
{
    ...
}

My problem is, that I need to support both UTF and Windows-1250 encoded files so I need a way to detect the encoding of the submitted file.

4
  • Is there any to know any part of the content of this file? I.e. if you knew that a particular string was likely to be there you could read it and see if it can be found, if not try it with a different encoding. Commented Jan 9, 2013 at 12:47
  • @AndrasZoltan I only know that the files are CSV files, either created in Excel (Windows-1250) or exported from Google Docs (UTF). I do not known the content of those files. Commented Jan 9, 2013 at 12:48
  • @mathieu in this specific case (UTF-8 or 1250) that answer doesn't apply Commented Jan 9, 2013 at 13:01
  • If you can use a BOM use it else see stackoverflow.com/q/90838/266919 Commented Jan 9, 2013 at 13:19

1 Answer 1

0

Trying to decode a file encoded in Windows-1250 as UTF-8 is extremely likely to cause an exception (or if not, the file is only using ASCII subset so it doesn't matter what encoding is used to decode) with exception fallback, so you could do something like this:

Encoding[] encodings = new Encoding[]{
    Encoding.GetEncoding("UTF-8", new EncoderExceptionFallback(), new DecoderExceptionFallback()),
    Encoding.GetEncoding(1250, new EncoderExceptionFallback(), new DecoderExceptionFallback())
};


String result = null;

foreach( Encoding enc in encodings ) {

    try {
        result = enc.GetString( fileAsByteArray );
        break;
    }

    catch( DecoderFallbackException e ) {

    }

}
Sign up to request clarification or add additional context in comments.

3 Comments

If I try to read an win1250 file as UTF using your code, it throws an exception, but the next iteration that tries to read the file as win1250 gets an stream with sr.EndOfStream==true so there is nothing to read. I tried putting file.InputStream.Seek(0, SeekOrigin.Begin) after try but it did not help
@IgorKulman yeah I am quite shady on the details but the principle is working as you can see. Maybe you can read the file to a byte array first and use the byte array instead of stream if that's feasible.
@IgorKulman I guess it's the using statement, after the first iteration the stream will be closed

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.