10

I have a Perl script that traverses a directory hierarchy using File::Next::files. It will only return to the script files that end in ".avi", ".flv", ".mp3", ".mp4", and ".wmv." Also it will skip the following sub directories: ".svn" and any sub directory that ends in ".frames." This is specified in the file_filter and descend_filter subroutines below.

my $iter = File::Next::files(
        { file_filter => \&file_filter, descend_filter => \&descend_filter },
        $directory );

sub file_filter { 
    # Called from File::Next:files.
    # Only select video files that end with the following extensions.
    /.(avi|flv|mp3|mp4|wmv)$/
}

sub descend_filter { 
    # Called from File::Next:files.
    # Skip subfolders that either end in ".frames" or are named the following:
    $File::Next::dir !~ /.frames$|^.svn$/
}

What I want to do is place the allowed file extensions and disallowed sub directory names in a configuration file so they can be updated on the fly.

What I want to know is how do I code the subroutines to build regex constructs based on the parameters in the configuration file?

/.(avi|flv|mp3|mp4|wmv)$/

$File::Next::dir !~ /.frames$|^.svn$/
3
  • Can't help you with your question, but that package you're using looks awesome. I was doing the same thing with plain old File::Find and it was much messier. I'll have to give this one a try. Thanks! +1 Commented May 22, 2009 at 16:11
  • Checkout: search.cpan.org/dist/File-Next Commented May 22, 2009 at 17:01
  • p3rl.org/File::Find::Rule might be better for you, depending on situation. Commented May 24, 2009 at 1:37

6 Answers 6

29

Assuming that you've parsed the configuration file to get a list of extensions and ignored directories, you can build the regular expression as a string and then use the qr operator to compile it into a regular expression:

my @extensions = qw(avi flv mp3 mp4 wmv);  # parsed from file
my $pattern    = '\.(' . join('|', @wanted) . ')$';
my $regex      = qr/$pattern/;

if ($file =~ $regex) {
    # do something
}

The compilation isn't strictly necessary; you can use the string pattern directly:

if ($file =~ /$pattern/) {
    # do something
}

Directories are a little harder because you have two different situations: full names and suffixes. Your configuration file will have to use different keys to make it clear which is which. e.g. "dir_name" and "dir_suffix." For full names I'd just build a hash:

%ignore = ('.svn' => 1);

Suffixed directories can be done the same way as file extensions:

my $dir_pattern = '(?:' . join('|', map {quotemeta} @dir_suffix), ')$';
my $dir_regex   = qr/$dir_pattern/;

You could even build the patterns into anonymous subroutines to avoid referencing global variables:

my $file_filter    = sub { $_ =~ $regex };
my $descend_filter = sub {
    ! $ignore{$File::Next::dir} &&
    ! $File::Next::dir =~ $dir_regex;
};

my $iter = File::Next::files({
    file_filter    => $file_filter,
    descend_filter => $descend_filter,
}, $directory);
Sign up to request clarification or add additional context in comments.

1 Comment

What I didn't explained was that I will have clients modifying the configuration file. I can't assume they will know Perl or know enough to not introduce a syntax error into the regular expression. So I really don't want to read a regular expression from the configuration file, I just want to a list of file extensions and directory names and/or directory patterns. Example: ext = avi ext = flv ext = mp3 dir = .svn dirp= .frames Once this information is read, then I want to dynamically create something that will function like: .(avi|flv|mp3|mp4|wmv)$
3

Lets say that you use Config::General for you config-file and that it contains these lines:

<MyApp>
    extensions    avi flv mp3 mp4 wmv
    unwanted      frames svn
</MyApp>

You could then use it like so (see the Config::General for more):

my $conf = Config::General->new('/path/to/myapp.conf')->getall();
my $extension_string = $conf{'MyApp'}{'extensions'};

my @extensions = split m{ }, $extension_string;

# Some sanity checks maybe...

my $regex_builder = join '|', @extensions;

$regex_builder = '.(' . $regex_builder . ')$';

my $regex = qr/$regex_builder/;

if($file =~ m{$regex}) {
    # Do something.
}


my $uw_regex_builder = '.(' . join ('|', split (m{ }, $conf{'MyApp'}{'unwanted'})) . ')$';
my $unwanted_regex = qr/$uw_regex_builder/;

if(File::Next::dir !~ m{$unwanted_regex}) {
    # Do something. (Note that this does not enforce /^.svn$/. You
    # will need some kind of agreed syntax in your conf-file for that.
}

(This is completely untested.)

2 Comments

Thanks. By the way, why is the my $regex = qr/$regex_builder/ statement necessary?
It isn't necessary to build the whole regex into a string before using qr//. You can just do this: my $regex_builder = join '|', @extensions; my $regex = qr/\.($regex_builder)$/;
3

Build it like you would a normal string and then use interpolation at the end to turn it into a compiled regex. Also be careful, you are not escaping . or putting it in a character class, so it means any character (rather than a literal period).

#!/usr/bin/perl

use strict;
use warnings;

my (@ext, $dir, $dirp);
while (<DATA>) {
    next unless my ($key, $val) = /^ \s* (ext|dirp|dir) \s* = \s* (\S+)$/x;
    push @ext, $val if $key eq 'ext';
    $dir = $val     if $key eq 'dir';
    $dirp = $val    if $key eq 'dirp';
}

my $re = join "|", @ext;
$re = qr/[.]($re)$/;

print "$re\n";

while (<>) {
    print /$re/ ? "matched" : "didn't match", "\n";
}

__DATA__
ext = avi
ext = flv
ext = mp3
dir = .svn
dirp= .frames

2 Comments

When I ran the code and printed out $re I got: (?-xism:[.](avi|flv|mp3)$) Seems to work. Thanks very much.
I'd assume that there could be multiple values for directories and/or directory suffixes to ignore, although that wasn't explicitly specified.
1

Its reasonably straight forward with File::Find::Rule, just a case of creating the list before hand.

use strict;
use warnings;
use aliased 'File::Find::Rule';


# name can do both styles. 
my @ignoredDirs = (qr/^.svn/,  '*.frames' );
my @wantExt = qw( *.avi *.flv *.mp3 );

my $finder = Rule->or( 
    Rule->new->directory->name(@ignoredDirs)->prune->discard, 
    Rule->new->file->name(@wantExt)
);

$finder->start('./');

while( my $file = $finder->match() ){
    # Matching file.
}

Then its just a case of populating those arrays. ( Note: above code also untested, but will likely work ). I'd generally use YAML for this, it makes life easier.

use strict;
use warnings;
use aliased 'File::Find::Rule';
use YAML::XS;

my $config = YAML::XS::Load(<<'EOF');
---
ignoredir:
- !!perl/regexp (?-xism:^.svn)
- '*.frames'
want:
- '*.avi'
- '*.flv'
- '*.mp3'
EOF

my $finder = Rule->or( 
    Rule->new->directory->name(@{ $config->{ignoredir} })->prune->discard, 
    Rule->new->file->name(@{ $config->{want} })
);

$finder->start('./');

while( my $file = $finder->match() ){
    # Matching file.
}

Note Using the handy module 'aliased.pm' which imports "File::Find::Rule" for me as "Rule".

  • File::Find::Rule - Alternative interface to File::Find
  • YAML::XS - Perl YAML Serialization using XS and libyaml
  • aliased - Use shorter versions of class names.

Comments

1

If you want to build a potentially large regexp and don't want to bother debugging the parentheses, use a Perl module to build it for you!

use strict;
use Regexp::Assemble;

my $re = Regexp::Assemble->new->add(qw(avi flv mp3 mp4 wmv));

...

if ($file =~ /$re/) {
    # a match!
}

print "$re\n"; # (?:(?:fl|wm)v|mp[34]|avi)

Comments

0

Although File::Find::Rule already has ways to deal with this, in similar cases you don't really want a regex. The regex doesn't buy you much here because you're looking for a fixed sequence of characters at the end of each filename. You want to know if that fixed sequence is in a list of sequences that interest you. Store all the extensions in a hash and look in that hash:

my( $extension ) = $filename =~ m/\.([^.]+)$/;
if( exists $hash{$extension} ) { ... }

You don't need to build up a regular expression, and you don't need to go through several possible regex alternations to check every extension you have to examine.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.