Example
The following code constructs a diabolical directory that will result in graph traversal happening not once, but several times.
#!/usr/bin/env perl
use strict;
use warnings;
## Creating an example "Evil" directory to traverse.
use Test::TempDir::Tiny qw( tempdir );
use Path::Tiny qw( path );
my $dir = tempdir();
# Problem directories
path( $dir, "evil/*" )->mkpath;
path( $dir, "evil/*/starfile" )->touch;
path( $dir, "evil/**" )->mkpath;
path( $dir, "evil/**/doublestarfile" )->touch;
# Normal Files
path( $dir, "evil/child/deep" )->mkpath;
path( $dir, "evil/child" )->mkpath;
path( $dir, "evil/child/a" )->touch;
path( $dir, "evil/child/b" )->touch;
path( $dir, "evil/child/deep/a" )->touch;
path( $dir, "evil/child/deep/b" )->touch;
## Evil directory now constructed, emulated simplified recursion
## using glob
our $DEPTH = 0;
our $PAD = '';
sub list_files_recursively {
my ( $path ) = @_;
STDERR->print("${PAD}?: $path\n");
local $DEPTH = $DEPTH + 1;
local $PAD = ' ' x $DEPTH;
my @out;
###### DONT DO THIS ######
for my $leaf ( glob $path ) {
###### DONT DO THIS ######
if ( -d $leaf ) {
STDERR->print("${PAD}d: $leaf\n");
push @out, list_files_recursively( $leaf . '/*' );
}
else {
STDERR->print("${PAD}f: $leaf\n");
push @out, $leaf;
}
}
return @out;
}
for my $entry ( list_files_recursively( "$dir/*" ) ) {
print $entry . "\n";
}
print "Expected layout:\n";
system("find",$dir,"-type", "f" );
Executing this code will reveal list_files_recursively "visits" the same files multiple times.
Full output here omitted for brevity
The most important part is to note what happens when it tries to list the children of the directory called **
d: /tmp/7LYkL6zdrB/test_pl/default_1/evil/**
?: /tmp/7LYkL6zdrB/test_pl/default_1/evil/**/*
f: /tmp/7LYkL6zdrB/test_pl/default_1/evil/**/doublestarfile
f: /tmp/7LYkL6zdrB/test_pl/default_1/evil/*/starfile
f: /tmp/7LYkL6zdrB/test_pl/default_1/evil/child/a
f: /tmp/7LYkL6zdrB/test_pl/default_1/evil/child/b
d: /tmp/7LYkL6zdrB/test_pl/default_1/evil/child/deep
Yes, that's thinking "Hey, it looks like 'child/deep' is a child of **".
Which it really really isn't.
And there are probably plenty more glob rules that can be exploited here, * is just the most dangerous.
For instance, glob interprets spaces as token delimiters, so glob "a b" -> 2 entries, not 1.