13

Im reading a file with ReadAllText

    String[] values = File.ReadAllText(@"c:\\c\\file.txt").Split(';');

    int i = 0;

    foreach (String s in values)
    {
        System.Console.WriteLine("output: {0} {1} ", i, s);
        i++;
    }

If I try to read some files I get sometimes the the wrong character back (for ÖÜÄÀ...). The output is like '?', its because there is some trouble with the encoding:

output: 0 TEST
output: 1 A??O?

One solution would be to set the encoding in ReadAllText, lets say something like ReadAllText(@"c:\\c\\file.txt", Encoding.UTF8) that could fix the problem. But what if I would still get '?' as output? What if I dont know the encoding of the file? And what if every single file got a different encoding? What would be the best way to do it with c#? Thank you

3
  • 1
    You need to know what the encoding is. And there is no 100% reliable way to find out based purely on the contents of the file. Commented May 25, 2012 at 11:20
  • Please refer to this post stackoverflow.com/questions/2239968/… Commented May 25, 2012 at 11:25
  • You can use File.ReadAllText(Path, Encoding.Default) and the framework detects automatically if it is an ANSI or UTF8 file. Commented Jun 4, 2024 at 2:18

3 Answers 3

10

The only way to reliably do this is to look for byte order marks at the start of the text file. (This blob more generally represents the endianness of character encoding used, but also the encoding - e.g. UTF8, UTF16, UTF32). Unfortunately, this method only works for Unicode-based encodings, and nothing before that (for which much less reliable methods must be used).

The StreamReader type supports detecting these marks to determine the encoding - you simply need to pass a flag to the parameter as such:

new System.IO.StreamReader("path", true)

You can then check the value of stremReader.CurrentEncoding to determine the encoding used by the file. Note however that if no byte encoding marks exist, then CurrentEncoding will default to Encoding.Default.

Refer codeproject solution to detect encoding

Sign up to request clarification or add additional context in comments.

1 Comment

If no byte encoding marks exist, then CurrentEncoding will use Encoding.UTF8 not Encoding.Default. "The detectEncodingFromByteOrderMarks parameter detects the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the UTF8Encoding is used." from the docs
7

You have to check file encoding first. try this

System.Text.Encoding enc = null; 
System.IO.FileStream file = new System.IO.FileStream(filePath, 
    FileMode.Open, FileAccess.Read, FileShare.Read); 
if (file.CanSeek) 
{ 
    byte[] bom = new byte[4]; // Get the byte-order mark, if there is one 
    file.Read(bom, 0, 4); 
    if ((bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) || // utf-8 
        (bom[0] == 0xff && bom[1] == 0xfe) || // ucs-2le, ucs-4le, and ucs-16le 
        (bom[0] == 0xfe && bom[1] == 0xff) || // utf-16 and ucs-2 
        (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff)) // ucs-4 
    { 
        enc = System.Text.Encoding.Unicode; 
    } 
    else 
    { 
        enc = System.Text.Encoding.ASCII; 
    } 

    // Now reposition the file cursor back to the start of the file 
    file.Seek(0, System.IO.SeekOrigin.Begin); 
} 
else 
{ 
    // The file cannot be randomly accessed, so you need to decide what to set the default to 
    // based on the data provided. If you're expecting data from a lot of older applications, 
    // default your encoding to Encoding.ASCII. If you're expecting data from a lot of newer 
    // applications, default your encoding to Encoding.Unicode. Also, since binary files are 
    // single byte-based, so you will want to use Encoding.ASCII, even though you'll probably 
    // never need to use the encoding then since the Encoding classes are really meant to get 
    // strings from the byte array that is the file. 

    enc = System.Text.Encoding.ASCII; 
}

1 Comment

Thanks! Note that the FileStream is open after this code and should be closed if this code is used in some GetEncoding method.
0

In my case, I was creating some simple json file and was getting same error. The problem was creating the file using Visual Studio (2019 at the moment).

I am sure you can find some configuration in VS options to deal with this issue. However, the quickiest way I've found was to create the same file and content using Notepad++. You can set the encoding in Notepad++ by visiting Encoding top menu. And I believe you may also find similar config in other text editors.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.