1

there are about 28000 articles in our institution and their encoding is not utf-8. I was asked to find a way to change their encoding to utf-8. is there any linux or windows command that changes the encoding of file without opening the file? clearly it is not a good idea to open 28000 files and changing them one by one!

11
  • 2
    If you don't even open the file, you can't read the data, much less rewrite it… Commented Oct 6, 2013 at 6:57
  • but I know what their encoding is Commented Oct 6, 2013 at 6:59
  • This is not a programming question, and is off-topic here. "Is there any linux or windows command" is a question for Super User. Voting to migrate there. Good luck. Commented Oct 6, 2013 at 7:08
  • 2
    this is about shell programming so it is programming. Commented Oct 6, 2013 at 7:09
  • And you also know the contents of all the files you want to recode without opening and reading the files? Commented Oct 6, 2013 at 7:09

2 Answers 2

8

iconv can be used to convert text files from one encoding to another. Most linux distros should have it—usually as part of glibc; if not, then as a separate installable package.

So, if they're, say, Latin-1 (ISO-8859-1), you can do something like this:

$ iconv -f ISO-8859-1 -t UTF-8 foo.txt >foo-utf8.txt

You can wrap this up in a one-liner with find, something like:

$ tmpdir=$(mktemp -d -t $tempXXXXXX); find . -type f -exec iconv -f ISO-8859-1 -t UTF-8 {} >${tmpdir}/temp \; -exec mv ${tmpdir}/temp {} \; ; rmdir ${tmpdir}

But you can probably make it more readable and more robust in a half-dozen lines of bash/python/perl/whatever.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks for reply, I will test your solution and let you know the results
0

you can change the encoding of a file easily by using basic shell commands.

$filesDir = Get-ChildItem "D:\Code"
$OutputDir="D:\programability\"
for ($j=0; $j -lt $filesDir.Count; $j++)
{
$SubDir=$filesDir[$j].FullName
[system.io.directory]::CreateDirectory($OutputDir+$filesDir[$j].name)
$files = Get-ChildItem $SubDir
for ($i=0; $i -lt $files.Count; $i++) {
    $outfile = $OutputDir+$filesDir[$j].name+"\"+$files[$i].name 
     $files[$i].name    
    Get-Content $files[$i].FullName | Set-Content -Encoding UTF8 $outfile
}
}

This will change the file encoding to UTF-8, including files in subfolders

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.