0

I have put together a Perl script to go through a directory and match various keys in the source and output the results to a text file. The match operation works well, however the end goal is to perform a replace operation. The Perl script is as follows:

  #!/usr/bin/perl
  #use strict;
  use warnings;

  #use File::Slurp;

  #declare variables
  my $file = '';
  my $verbose = 0;
  my $logfile;

  my @files = grep {/[.](pas|cmm|ptd|pro)$/i} glob 'C:\users\perry_m\desktop\epic_test\pascal_code\*.*';

  #iterate through the files in input directory
  foreach $file (@files) {

     print "$file\n";

     #read the file into a single string
     open FILEHANDLE, $file or die $!;
     my $string = do { local $/; <FILEHANDLE> };

     #perfrom REGEX on this string

     ########################################################
     #fix the include formats to conform to normal PASCAL
     $count = 0;
     while ($string =~ m/%INCLUDE/g)
     {
        #%include
        $count++;
     }
     if ($count > 0)
     {
        print " $count %INCLUDE\n";
     }
     $count = 0;
     while ($string =~ m/INCLUDE/g)
     {
        #%INCLUDE;
        $count++;
     }
     if ($count > 0)
     {
        print " $count INCLUDE\n";
     }
     $count = 0;
     while ($string =~ m/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/g)
     {
        #$1$2;
        $count++;
     }
     if ($count > 0)
     {
        print " $count XXXX:include \n";
     }        
  }

This produces output as desired, an example is below:

  C:\users\perry_m\desktop\epic_test\pascal_code\BRTINIT.PAS
   1 INCLUDE
   2 XXXX:include 
   39 external and readonly

However if I change the regex operations to try and implement a replace, using the replacement operation shown in the commented lines above, the scripts hangs and never returns. I imagine it is somehow related to memory, but I am new to Perl. I was also trying to avoid parsing the file by line if possible.

Example:

  while ($string =~ s/%INCLUDE/%include/g)
  {
     #%include
     $count++;
  }

and

  while ($string =~ s/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/$1$2;/g)
  {
     #$1$2;
     $count++;
  }

Edit: simplified the examples

8
  • 12
    You should never comment out use strict to get things to work. Fix the problems that it reveals instead. Commented Oct 15, 2012 at 18:54
  • 2
    Please provide a minimal demonstration of your problem. Commented Oct 15, 2012 at 18:54
  • @Borodin I will fix those problems in the end, I am on a time crunch right now. Commented Oct 15, 2012 at 18:59
  • @ikegami I will simplify things, I got a little carried away. This is certainly the first time I have been asked to present less information lol Commented Oct 15, 2012 at 19:00
  • 4
    @dislexicmofo: removing use strict is a false economy. It is not an extravagance like, say, comments, but is there to help you write working code more quickly. You could easily spend hours searching for a bug that use strict would reveal in a moment Commented Oct 15, 2012 at 19:13

2 Answers 2

4

The problem is with your while loops. A loop like

while ($string =~ m/INCLUDE/g) { ... }

will execute once for each ocurrence of INCLUDE in the target string, but a subtitution like

$string =~ s/INCLUDE/%INCLUDE;/

will make all of the replacement in one go and retuen the number of replacements made. So a loop

while ($string =~ s/INCLUDE/%INCLUDE;/g) { ... }

will endlessly add more and more percentage signs before and semicolons after every INCLUDE.

To find the number of replacements made, change all your loops like this to just

$count = $string =~ s/INCLUDE/%INCLUDE;/g
Sign up to request clarification or add additional context in comments.

6 Comments

Ah! I see. So is there a way to tell it to skip the "regex cursor" to the end of the original match after the replace? I would also prefer to have the replace operation done inside the while loop if I could. The end goal is to have the match strings and replace strings in two arrays and iterate through them with a for loop. That is why I formatted them the way that I did. Thanks for the help btw!
That's what it does already: each string search starts after the end of the previous one. It just does them all at once. If you put your match and replacement strings as pairs in a hash then you can write a single substitution to replace all of them at once. You will have to describe your problem better if you want more help - I don't yet see why you need to put a loop around your subtitutions
The loop was used so I could count the matches and print some debug information as I developed the script. I have used regex before, but not with perl, as a matter of fact I have never used perl so the debug output was important. The problem is rather simple, I would like to perform all of the defined replace operations on the input file and then have some way of knowing what replacements were performed, and in what files. The replace values are in comments in the while loops in the code above.
@Borodin thanks for all of your help. I will look into using hashes as that is similar to what I had in mind, but I would like to keep the individual counts for output. Do you mind removing your down vote as well? I think I made the question much clearer as we discussed the solution.
@dislexicmofo: I think you would be better off procesing your file line by line. Slurping it all into memory seems to be causing a problem here, and is certainly unnecessary. And I didn't downvote your question
|
0

the pattern in s/INCLUDE/%INCLUDE/g will match the replacement also, so if you're running it in a while loop it will run forever (until you run out of memory).

s///g will replace all matches in a single shot so you very rarely will need to put it in a loop. Same goes for m//g, it will do the counting in a single step if you put it in list context.

5 Comments

Wht do you mean by "m//g ... will do the counting"? There is no way to get it to return a count, and the match operator m//g isn't being used in list context here. That's why it works and the substitution doesn't.
so the while loop is not needed then. $string =~ s/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/$1$2;/g would replace them all in one run? Would I still be able to make successive calls without having them in a while loop structure? Or is the position value global? Thanks!
The count was really just a debug tool. If I can simply perform the replaces as described in my above comment that will be fine. I was unaware that you could call them that way.
@Borodin - in list context, m//g will return all the matches. To count how many matches there are, you just need to get the length of that list. i.e., scalar(my @matches=($str=~m/expr/g)) will be the number of times 'expr' appears in '$str'
Thanks for your reply, but Borodin beat you to it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.