Perl Regex match works, but replace does not

Question

I have put together a Perl script to go through a directory and match various keys in the source and output the results to a text file. The match operation works well, however the end goal is to perform a replace operation. The Perl script is as follows:

  #!/usr/bin/perl
  #use strict;
  use warnings;

  #use File::Slurp;

  #declare variables
  my $file = '';
  my $verbose = 0;
  my $logfile;

  my @files = grep {/[.](pas|cmm|ptd|pro)$/i} glob 'C:\users\perry_m\desktop\epic_test\pascal_code\*.*';

  #iterate through the files in input directory
  foreach $file (@files) {

     print "$file\n";

     #read the file into a single string
     open FILEHANDLE, $file or die $!;
     my $string = do { local $/; <FILEHANDLE> };

     #perfrom REGEX on this string

     ########################################################
     #fix the include formats to conform to normal PASCAL
     $count = 0;
     while ($string =~ m/%INCLUDE/g)
     {
        #%include
        $count++;
     }
     if ($count > 0)
     {
        print " $count %INCLUDE\n";
     }
     $count = 0;
     while ($string =~ m/INCLUDE/g)
     {
        #%INCLUDE;
        $count++;
     }
     if ($count > 0)
     {
        print " $count INCLUDE\n";
     }
     $count = 0;
     while ($string =~ m/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/g)
     {
        #$1$2;
        $count++;
     }
     if ($count > 0)
     {
        print " $count XXXX:include \n";
     }        
  }

This produces output as desired, an example is below:

  C:\users\perry_m\desktop\epic_test\pascal_code\BRTINIT.PAS
   1 INCLUDE
   2 XXXX:include 
   39 external and readonly

However if I change the regex operations to try and implement a replace, using the replacement operation shown in the commented lines above, the scripts hangs and never returns. I imagine it is somehow related to memory, but I am new to Perl. I was also trying to avoid parsing the file by line if possible.

Example:

  while ($string =~ s/%INCLUDE/%include/g)
  {
     #%include
     $count++;
  }

and

  while ($string =~ s/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/$1$2;/g)
  {
     #$1$2;
     $count++;
  }

Edit: simplified the examples

You should never comment out use strict to get things to work. Fix the problems that it reveals instead. — Borodin
– Borodin, Commented Oct 15, 2012 at 18:54
@Borodin I will fix those problems in the end, I am on a time crunch right now. — bobthearsonist
– bobthearsonist, Commented Oct 15, 2012 at 18:59
@ikegami I will simplify things, I got a little carried away. This is certainly the first time I have been asked to present less information lol — bobthearsonist
– bobthearsonist, Commented Oct 15, 2012 at 19:00
@dislexicmofo: removing use strict is a false economy. It is not an extravagance like, say, comments, but is there to help you write working code more quickly. You could easily spend hours searching for a bug that use strict would reveal in a moment — Borodin
– Borodin, Commented Oct 15, 2012 at 19:13

Borodin · Accepted Answer · 2012-10-15 20:14:14Z

4

The problem is with your while loops. A loop like

while ($string =~ m/INCLUDE/g) { ... }

will execute once for each ocurrence of INCLUDE in the target string, but a subtitution like

$string =~ s/INCLUDE/%INCLUDE;/

will make all of the replacement in one go and retuen the number of replacements made. So a loop

while ($string =~ s/INCLUDE/%INCLUDE;/g) { ... }

will endlessly add more and more percentage signs before and semicolons after every INCLUDE.

To find the number of replacements made, change all your loops like this to just

$count = $string =~ s/INCLUDE/%INCLUDE;/g

edited Oct 15, 2012 at 20:14

answered Oct 15, 2012 at 19:07

Borodin

127k9 gold badges72 silver badges146 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

bobthearsonist Over a year ago

Ah! I see. So is there a way to tell it to skip the "regex cursor" to the end of the original match after the replace? I would also prefer to have the replace operation done inside the while loop if I could. The end goal is to have the match strings and replace strings in two arrays and iterate through them with a for loop. That is why I formatted them the way that I did. Thanks for the help btw!

Borodin Over a year ago

That's what it does already: each string search starts after the end of the previous one. It just does them all at once. If you put your match and replacement strings as pairs in a hash then you can write a single substitution to replace all of them at once. You will have to describe your problem better if you want more help - I don't yet see why you need to put a loop around your subtitutions

bobthearsonist Over a year ago

The loop was used so I could count the matches and print some debug information as I developed the script. I have used regex before, but not with perl, as a matter of fact I have never used perl so the debug output was important. The problem is rather simple, I would like to perform all of the defined replace operations on the input file and then have some way of knowing what replacements were performed, and in what files. The replace values are in comments in the while loops in the code above.

bobthearsonist Over a year ago

@Borodin thanks for all of your help. I will look into using hashes as that is similar to what I had in mind, but I would like to keep the individual counts for output. Do you mind removing your down vote as well? I think I made the question much clearer as we discussed the solution.

Borodin Over a year ago

@dislexicmofo: I think you would be better off procesing your file line by line. Slurping it all into memory seems to be causing a problem here, and is certainly unnecessary. And I didn't downvote your question

|

evil otto · Accepted Answer · 2012-10-15 19:06:46Z

0

the pattern in s/INCLUDE/%INCLUDE/g will match the replacement also, so if you're running it in a while loop it will run forever (until you run out of memory).

s///g will replace all matches in a single shot so you very rarely will need to put it in a loop. Same goes for m//g, it will do the counting in a single step if you put it in list context.

answered Oct 15, 2012 at 19:06

evil otto

10.6k28 silver badges38 bronze badges

5 Comments

Borodin Over a year ago

Wht do you mean by "m//g ... will do the counting"? There is no way to get it to return a count, and the match operator m//g isn't being used in list context here. That's why it works and the substitution doesn't.

bobthearsonist Over a year ago

so the while loop is not needed then. $string =~ s/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/$1$2;/g would replace them all in one run? Would I still be able to make successive calls without having them in a while loop structure? Or is the position value global? Thanks!

bobthearsonist Over a year ago

The count was really just a debug tool. If I can simply perform the replaces as described in my above comment that will be fine. I was unaware that you could call them that way.

evil otto Over a year ago

@Borodin - in list context, m//g will return all the matches. To count how many matches there are, you just need to get the length of that list. i.e., scalar(my @matches=($str=~m/expr/g)) will be the number of times 'expr' appears in '$str'

bobthearsonist Over a year ago

Thanks for your reply, but Borodin beat you to it.

Collectives™ on Stack Overflow

Perl Regex match works, but replace does not

2 Answers 2

6 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related