2

I have a directory with nearly 1,200 files. I need to successively go through each file in a perl script to search and replace any occurrences of 66 strings. So, for each file I need to run all 66 s&r's. My replace string is in Thai, so I cannot use the shell. It must be a .pl file or similar so that I can use use::utf8. I am just not familiar with how to open all files in a directory one by one to perform actions on them. Here is a sample of my s&r:

s/psa0*(\d+)/เพลงสดุดี\1/g;

Thanks for any help.

2
  • What OS do you have? If some kind of *nix then it's simpler to use something like for f in *; do perl_script -i "$f"; done Commented Apr 16, 2012 at 4:04
  • Windows with Strawberry Perl. Commented Apr 16, 2012 at 5:16

3 Answers 3

2
use utf8;
use strict;
use warnings;

use File::Glob qw( bsd_glob );

@ARGV = map bsd_glob($_), @ARGV;

while (<>) {    
   s/psa0*(?=\d)/เพลงสดุดี/g;
   print;
}

perl -i.bak script.pl *

I used File::Glob's bsd_glob since glob won't handle spaces "correctly". They are actually the same function, but the function behaves differently based on how it's called.


By the way, using \1 in the replacement expression (i.e. outside a regular expression) makes no sense. \1 is a regex pattern that means "match what the first capture captured". So

s/psa0*(\d+)/เพลงสดุดี\1/g;

should be

s/psa0*(\d+)/เพลงสดุดี$1/g;

The following is a faster alternative:

s/psa0*(?=\d)/เพลงสดุดี/g;
Sign up to request clarification or add additional context in comments.

1 Comment

thanks. actually in the final script I changed it to $1 because perl complained. either way it worked.
1

See opendir/readdir/closedir for functions that can iterate through all the filenames in a directory (much like you would use open/readline/close to iterate through all the lines in a file).

Also see the glob function, which returns a list of filenames that match some pattern.

1 Comment

The glob function was the key. I will post what I actually did in a bit.
1

Just in case someone could use it in the future. This is what I actually did.

use warnings;
use strict;

use utf8;

my @files = glob ("*.html");

foreach $a (@files) {
   open IN, "$a" or die $!;
   open OUT, ">$a-" or die $!;
   binmode(IN, ":utf8");
   binmode(OUT, ":utf8");
   select (OUT);
   foreach (<IN>) {
      s/gen0*(\d+)/ปฐมกาล $1/;
      s/exo0*(\d+)/อพยพ $1/;
      s/lev0*(\d+)/เลวีนิติ $1/;
      s/num0*(\d+)/กันดารวิถี $1/;
      ...etc...
      print "$_";
   }
   close IN;
   close OUT;
};

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.