0

I want to create a directory structure based on a partition map, as in these examples. I've tried various approaches with failure. Recursive programming makes my brain hurt!

Example 1

$partitionMap = '#/#'; //An example of two levels deep

myfolder/0
myfolder/1
myfolder/2  
...
myfolder/f
myfolder/f/0
myfolder/f/1
myfolder/f/2
myfolder/f/3
...

Example 2

$partitionMap = '#/#/#'; //An example of three levels deep

myfolder/a
myfolder/a/4
myfolder/a/4/f

Example 3

$partitionMap = '#/##/###'; //An example of three levels deep

myfolder/0
myfolder/1
myfolder/2
...
myfolder/a/00
myfolder/a/01
myfolder/e/02
myfolder/e/03
...
myfolder/e/03/af0
myfolder/e/03/af1
myfolder/e/03/af2
myfolder/e/03/af3

The # is a hexadecimal placeholder presenting 0-F as you can see above.

So I need something that works like this..

$partitionMap = '#/##'; 

makeFoldersRecursive( './myfolder', $partitionMap );

function makeFoldersRecursive( $path, $pmap ){
    $pmapArr = explode( '/', $pmap );

    $numNibbles = strlen( $pmapArr[0] );
    //Get hex folder count but in dec, # = 16, ## = 256, ### = 4096
    $folder_count_this_level = pow( 16, $numNibbles );

    for($i=0; $i<$count; $i++ ) {
        //Get left padded hex folder name from $i
        $folder = sprintf("%0" . $numNibbles . "s", dechex($i));
        mkdir( $path .  DIRECTORY_SEPARATOR . $folder , 0755 );

        if( array_key_exists( 1, $pmapArr ) ) {
            $passMap = $pmapArr; 
            array_shift( $pmapArr );
            makeFoldersRecursive( $path .  DIRECTORY_SEPARATOR . $folder, $passMap );
        }
    }



}

There's got to be a more elegant way of doing this, Any help would be appreciated!

2
  • 2
    You know that mkdir() takes a recursive param right? php.net/manual/en/function.mkdir.php mkdir('/path/to/create', 0755, true); Commented Sep 17, 2011 at 19:28
  • Yes, but I don't see how using it helps with what I'm trying to accomplish. Commented Sep 17, 2011 at 20:06

4 Answers 4

1

A purer recursive solution

You already have one, but I 'll give another spin on it that looks "purer" to my eyes and also explain how I got there.

Here's the code:

function create_all_dirs($directory, $map) {
    // 1: TERMINATION CONDITION
    if (empty($map)) {
        mkdir($directory, 0755, true);
        return;
    }

    // 2: PROCESS PART OF DATA, PREPARE DATA TO PASS ON RECURSION
    $parts = explode('/', $map);
    $partLength = strlen(array_shift($parts));
    $dirCount = pow(16, $partLength);    
    $map = implode('/', $parts);

    // 3: RECURSE
    for($i = 0; $i < $dirCount; ++$i) {
        $dir = $directory.DIRECTORY_SEPARATOR
                         .sprintf('%0'.$partLength.'s', dechex($i));
        create_all_dirs($dir, $map);
    }
}

Explanation

While I won't even suggest that there's one good way of reaching a solution for recursive functions, I can explain an approach that has worked well in practice for me.

Step 1: Think about what data you will need to pass when recursing.

In this case, every level of the function will be responsible for creating one level of the directory structure. So it makes sense that if you call

create_all_dirs('c:/', '#/#')

then at some point the function will make a recursive call that looks like this:

create_all_dirs('c:/0', '#')

That's all you need to pin down at this point. You can tell that it makes sense and that after a sequence of such calls the function will have (somehow, we haven't yet written it) done its job.

Step 2: Termination condition

With this knowledge, pin down your function's termination condition. In our case, given the above a reasonable termination condition would be "when the second parameter is the empty string". So what would the function do if called with a directory and an empty "map" string?

function create_all_dirs($directory, $map) {
    if(empty($map)) {
        mkdir($directory, 0755, true);
        return;
    }

    // ???
}

Obviously it should create the directory. I 'm calling mkdir in such a way that it will recursively create all directories on the path if they don't already exist because the final solution needs it to work, but that's not necessary. You already know how to do it without this convenience.

Step 3: Prepare the parameters common to all recursive calls

OK, so what do we need to make the recursive call? You already know that at each level we need to pop part of $map string and call the function recursively a number of times. We also need to feed it a new, deeper directory string and a new, shorter map string (all of this you could tell from the conclusion of step 1, even if you had not already created a solution).

So we have:

// cut off part of $map; we also need the length of the part we cut
// off, to know how many times we need to recursively call at this level
$parts = explode('/', $map);
$partLength = strlen(array_shift($parts));

// how many recursive calls?
$dirCount = pow(16, $partLength);

// this will be the $map parameter for all recursive calls
$map = implode('/', $parts);

Step 4: Call the function recursively

At this point I don't think there is anything more to explain. We calculate the value of $directory, which is different for each recursive call, and make it:

for($i = 0; $i < $dirCount; ++$i) {
    $dir = $directory.DIRECTORY_SEPARATOR
                     .sprintf('%0'.$partLength.'s', dechex($i));
    create_all_dirs($dir, $map);
}
Sign up to request clarification or add additional context in comments.

5 Comments

Jon, thank you for the help, it's nice to see I wasn't far off. By adding echo 'mkdir ' . $directory . "\n"; I see the beauty of your code in that it only creates the deepest directories by passing the recursive param to mkdir. As a side note for anyone looking to implement this code... In my findings, a directory takes up one block on the filesystem, 4096 bytes in my case. You want to use the partitionMap carefully as 16 ^ count(#) * FS_BLOCK_SIZE = DISKSPACE. #/#/# = 4096 dirs = 17MB, #/#/## = 65536 dirs = 268MB, #/#/#/## = 1 million dirs = 4.2GB
In my implementation, I'm storing files based on their checksum calculated by calling md5_file( $path_to_file ). This achieves two things. 1) the checksum becomes the filename so the exact same file saved twice is de-duplicated. 2) the files are distributed in directories based on the first characters of their md5 hash filename. /a/d/0/ad0f44f4452e639352fa3cd334341a23c.file The distribution of files per folder can be calculated as... TOTAL_EXPECTED_FILES / (16 ^ count(#)) = FILES_PER_DIR
@JCu: Actually due to the way the filesystem works it would be faster in this specific case to not create the directories recursively with mkdir and add mkdir($directory, 0755) just before step 3 in the function, which is exactly what you have done in your own solution. :)
Another question may show up on SO after I start to implement the following. repartition( $path, '#/#/#', '#/#/##' );. Where the code will create the new partition map directories and then move files to their new location. Doesn't sound fun to me!
I don't think that optimization is warranted, recursive is just fine. Thanks again!
0

Recursion isn't really needed and just makes things more confusing. This solution uses a queue directly to be more straightforward.

So, assuming you have a pattern like '#/#', you push that into a queue. Then you take it off, and replace the first '#' in it with one of your characters. This effectively multiplies the number of items in the queue by 16. You repeat this process, taking an item off, doing the replacement, and putting them back in, until you have replaced every '#' in every item in the queue.

Then you can just create the directories as usual.

$partitionMap = '#/##';

makeFolders('./myfolder', $partitionMap);

function makeFolders($path, $pmap) {

  // Use number of characters and number of '#' to compute total combinations
  $characters = array(
    '0', '1', '2', '3', '4', '5', '6', '7',
    '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
  );
  $count = substr_count($pmap, '#');
  $num_results = pow(count($characters), $count);

  // Start array with path and pmap
  $combos = array( $path . '/' . $pmap );

  // Go through array, replacing each '#' with one of the characters.
  while (count($combos) < $num_results) {
    $str = array_shift($combos);

    foreach ($characters as $c) {
      array_push($combos, preg_replace('/#/', $c, $str, 1));
    }
  }

  // Create the directories, using third 'TRUE' parameter
  // to avoid creating parents manually
  foreach ($combos as $dir) {
    mkdir($dir, 0755, TRUE);
  }
}

2 Comments

The problem with iterative solutions of this type is that before the final loop you have all data you will need in memory at the same time. Since this problem's size increases exponentially, in the general case you will quickly run out of memory.
To work around this in an iterative solution you would need to simulate recursion by keeping track of a $level variable which you increment when a recursive solution would recurse and decrement when it would return.
0

I found this on the internet somewhere. seems to work if you're not using relative path. i.e. "../basepath/newpath1/newpath2". Still trying to figure it out.

/**
 * recursively create a long directory path
 */
function createPath($path, $rwd=0777) {
    if (is_dir($path)) return true;
    rev_path = substr($path, 0, strrpos($path, '/', -2) + 1 );
$return = createPath($prev_path);
    return ($return && is_writable($prev_path)) ? mkdir($path, $rwd, true) : false;
}

Hope it helps somebody. I think its fairly elegant. Just a comment, recursion is a very common practice in all programming languages and has it's places. Try to understand it, don't denigrate it just because you refuse to learn how to code it. As an example, try to get a file list without recursion or functions and you'll lose yourself very quickly.

Comments

0

As long as you do not need long directory names (up to 8 hex digits on 32 bit systems), you can use this (all three versions have constant memory complexity):

// Example for "#/##"
for ($i = 0; $i <= 0x0FFF; $i++) {
    $d = sprintf("%01x/%02x", $i & 0x00F, ($i & 0xFF0) >> 4);
    echo $d, "\n";
    mkdir($d, 0777, true);
}

And with use of your format strings:

function mkhexdir($format) {
    // Parse format string
    $fmt_N = array_map('strlen', explode('/', $format));
    $bit_count = array_sum($fmt_N) * 4;
    $max = 1 << $bit_count;

    // Prepare format string for sprintf
    $fmt_printf = join('/', array_map(function($n) { return '%0'.$n.'x';}, $fmt_N));

    // Generate all dirs
    for ($i = 0; $i < $max; $i++) {
        $args = array();
        $offset = $bit_count;
        foreach ($fmt_N as $len) {
            $offset -= $len * 4;
            $b = (1 << ($len * 4)) - 1;
            $args[] = ($i >> $offset) & $b;
        }
        $d = vsprintf($fmt_printf, $args);
        echo $d, "\n";
        mkdir($d, 0777, true);
    }
}

mkhexdir('#/##/##');

Or another version that uses regular expression:

function mkhexdir($format) {
    // Parse format string
    $fmt_N = array_map('strlen', explode('/', $format));
    $digits = array_sum($fmt_N);
    $max = 1 << ($digits * 4);

    // Prepare format string for sprintf
    $fmt_printf = '%0'.$digits.'x';

    // Prepare regular expression to format path
    $src = '/^('.join(')(',
            array_map(function($n){ return str_repeat('.', $n);}, $fmt_N))
            .')$/';
    $dst = '\\'.join('/\\',
            array_map(function($n){ return $n + 1;}, array_keys($fmt_N)));

    // Generate all dirs
    for ($i = 0; $i < $max; $i++) {
        $d = preg_replace($src, $dst, sprintf($fmt_printf, $i));
        echo $d, "\n";
    }
}

mkhexdir('#/##/##');

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.