1

I am trying to loop through a directory and for ever text file that I find, convert the encoding to UTF-8 Fromat.

2
  • If you just want to get the job done, use existing tools such as iconv. If you want to write such a tool yourself, tell us what you have tried and where you are stuck. Commented Feb 8, 2011 at 16:33
  • This isn't very likely to come to a good end. The task implies that the text files are in an unknown encoding right now. Which means you don't know how to reliably read them. Commented Feb 8, 2011 at 16:59

2 Answers 2

2

Use DirectoryInfo and you're pretty much done

DirectoryInfo DI = new DirectoryInfo("TextFiles_Path");
FileInfo[] Files = DI.GetFiles("*.txt");
foreach(FileInfo Fl in Files)
{
    StreamReader SR = Fl.OpenText(); //This opens a stream to the file **in UTF8 encoding**
    StreamWriter SW = new StreamWriter(new FileStream(Fl.FullName + ".UTF8.txt", FileMode.OpenOrCreate), Encoding.UTF8);
    SW.Write(SR.ReadToEnd());
}

Enjoy

Sign up to request clarification or add additional context in comments.

4 Comments

Do you know if i need anything aside from using system.io? i am getting errors
File[] Files = DI.GetFiles("*.txt"); Error 1 Cannot implicitly convert type 'System.IO.FileInfo[]' to 'System.IO.File[]'
Sorry, edited so it uses the constructor that takes an encoding as parameter
I tried the code but it didn't work, the reason is simple, the OpenText() method reads the file content as UTF-8, this is the error in the code! if the file was saved with Windows-1252 encoding and you read its contents using OpenText() then characters are interpreted in a wrong way!
2

Fast and easy

For Each oFile In IO.Directory.GetFiles(dir, "*.*", IO.SearchOption.AllDirectories)
    IO.File.WriteAllText(oFile, IO.File.ReadAllText(oFile), Encoding.UTF8)
Next

2 Comments

Very nice quick option :) Unfortunately this doesn't work if the source file is in ANSI encoding which is the problem that brought me here. Any accented characters will become corrupted.
of course! because the ReadAllText method reads the content as UTF-8, so if you read the content of a none UTF-8 file as UTF-8 you will get a read error, saving the file after that in UTF-8 doesn't help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.