3

I'm trying to batch rename files in a folder with PHP. It's mostly working, though I'm having problems with accented characters.

An example of a filename with accented characters is ÅRE_GRÖN.JPG.

I would like to rename that file to ARE_GRON.JPG.

If I read the files in like this:

<?php
$path = __DIR__;
$dir_handle = opendir($path);

while ($file = readdir($dir_handle)) {
    echo $file . "\n";
}

closedir($dir_handle);

...And the page displays ÅRE_GRÖN.JPG.

If I add header('Content-Type: text/html; charset=UTF-8'); to the beginning of my script, it displays the correct file name, but the rename() function seems to have no effect either way.

Here's what I've tried:

while ($file = readdir($dir_handle)) {
    rename($file, str_replace('Ö', 'O', $file)); # No effect
    rename($file, str_replace('Ö', 'O', $file)); # No effect
}

Where am I going wrong?


Do say if you believe I'm using the wrong tool for the job. If anyone knows how to achieve this with a Bash script, show me. I have no Bash chops.

10
  • Are you on windows or linux? Commented Feb 12, 2013 at 21:54
  • Is your PHP script encoded as UTF-8? Commented Feb 12, 2013 at 21:56
  • Since he said bash, I would guess he is referring to bash(1) which would suggest Lunix. Commented Feb 12, 2013 at 21:56
  • This is what I could find: bugs.php.net/bug.php?id=39660 However I believe there should already be a work-around for this, like using the encoding system PHP is okay with. I'll post an answer if I ever find anything. Also possible duplicate to: stackoverflow.com/questions/873853/… Commented Feb 12, 2013 at 21:57
  • It could easily be Bash on cygwin, natively on Windows or on FreeBSD. Commented Feb 12, 2013 at 21:57

2 Answers 2

2

I figured out how to do it.

I first ran urlencode() on the filename. This converts the string:

MÖRKGRÅ.JPG

To the URL friendly:

MO%CC%88RKGRA%CC%8A.JPG

I then ran str_replace() on the URL-encoded string, providing needles and haystacks in arrays. I only needed it for a few Swedish characters, so my solution looked like this:

<?php

header('Content-Type: text/html; charset=UTF-8');

$path = __DIR__;

$dir_handle = opendir($path);

while ($file = readdir($dir_handle)) {
    $search = array('A%CC%8A', 'A%CC%88', 'O%CC%88');
    $replace = array('A', 'A', 'O');
    rename($file, str_replace($search, $replace, urlencode($file)));
}

closedir($dir_handle);

Job done :)


I've come to realise this is more versatile than I anticipated. Running another script, url_encode() gave me some slightly different output, but it's easy to change accordingly.

$search = array('%26Aring%3B', '%26Auml%3B', '%26Ouml%3B', '+');
$replace = array('A', 'A', 'O', '_');
Sign up to request clarification or add additional context in comments.

Comments

0

If you have a limited number of characters you want to replace, you can do it with

for f in *; do mv "$f" "${f//Ö/O/}" 2> /dev/null; done

On GNU you could more generally use

expr=""
for char in {A..Z}
do 
    expr+="s/[[=$char=]]/$char/g; "; 
done; 

for f in *; do 
    mv "$f" "$(sed -e "$expr" <<< "$f")" 2> /dev/null; 
done

to replace all A-like accented characters with an ascii A, for every character in the alphabet, but with no guarantees for OS X sed. Beware that this has the side effect of capitalizing all filenames.

2 Comments

Hmm.. I tried running that first script from the directory that holds the files, but it didn't seem to have any effect.
Try copy-pasting the Ö character from the filename rather than typing it. Unicode has a lot of pretty identical Ö characters.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.