2

Is there any program to change file encoding to UTF-8 programmatically. I have like 1000 files and I want to save them in UTF-8 format in linux.

Thanks.

2 Answers 2

5

iconv will take care of that, use it like this:

iconv -f ISO88591 -t UTF8 in.txt out.txt

where 88591 is the encoding for latin1, one of the most common 8-bit encodings, which might (or not) be your input encoding.

If you don't know the input charset, you can detect it with the standard file command or the python based chardet. For instance:

iconv -f $(file -bi myfile.txt | sed -e 's/.*[ ]charset=//') -t UTF8 in.txt out.txt

You may want to do something more robust than this one liner, like don't process files when encoding is unknown.

From here, to iterate over multiple files, you can do something like

find . -iname *.txt -exec iconv -f ISO88591 -t UTF8 {} {} \;

I didn't check this, so you might want to google iconv and find, read about them here on SO, or simply read their man pages.

Sign up to request clarification or add additional context in comments.

2 Comments

I wanted it working with unknown charset, I found this: stackoverflow.com/questions/9824902/iconv-any-encoding-to-utf-8
@beerLantern: Edited my answer to cover detection. Be sure to check on your files though, charset detection can be tricky for small files and/or less common charsets.
3

iconv is the tool for the job.

iconv -f original_charset -t utf-8 originalfile > newfile 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.