0

i am an enthusiast of computers but never studied programming.

i am trying to learn Perl, because i found it interesting since i learned to use a little bit of regular expressions with Perl flavor, cause i needed to replace words in certain parts of the strings and that's how i found perl.

but i don't know anything about programming, i would like to know simple examples how to use regular expression from the shell (terminal) or basic scripts.

for example if i have in a folder a text document called : input.txt how can i perform the following regex.

text to match :

text text text
text text text

what i want : change the second occurrence of the word text for the word: changed

(\A.*?tex.*?)text(.*?)$

replace for : \1changed\3

expected result:

text changed text
text changed text

using a text editor that would be using Multi-line and global modifiers. now, how can i process this from the shell. CD path and then what? or a script? what should contain to make it workable.

please consider i don't know anything about Perl, but only about its regexp syntax

2 Answers 2

3

The regular expression part is easy.

 s/\btext\b.*?\K\btext\b/changed/;

However, how to apply it if you're learning perl... that's the hard part. One could demonstrate a one liner, but that's not that helpful.

perl -i -pe 's/\btext\b.*?\K\btext\b/changed/;' file.txt

So instead, I'd recommend looking at perlfaq5 #How do I change, delete, or insert a line in a file, or append to the beginning of a file?. Ultimately what you need to learn is how to open a file for reading, and iterate over the lines. And alternatively, how to open a file for writing. With these two tools, you can do a lot.

use strict;
use warnings;
use autodie;

my $file = 'blah.txt';
my $newfile = 'new_blah.txt';

open my $infh, '<', $file;
open my $outfh, '>', $newfile;

while (my $line = <$infh>) {
    # Any manipulation to $line here, such as that regex:
    # $line =~ s/\btext\b.*?\K\btext\b/changed/;

    print $outfh $line;
}

close $infh;
close $outfh;

Update to explain regex

s{
    \btext\b      # Find the first 'text' not embedded in another word
    .*?           # Non-greedily skip characters
    \K            # Keep the stuff left of the \K, don't include it in replacement
    \btext\b      # Match 2nd 'text' not embedded in another word
}{changed}x;      # Replace with 'changed'  /x modifier to allow whitespace in LHS.  
Sign up to request clarification or add additional context in comments.

8 Comments

trying to learn Perl is hard, the link you provided seems as it is assumed people have previous knowledge on Perl. thanks for the answer. why do you use boundaries? is ^ or \A and $ or \r\n consider as word boundaries as well?
Ultimately, your knowledge of a programming language is only going to be good as the quality of your reference material and how well you're familiar with it. Honestly, there are too many details and edge cases for most people to remember everything about a language, so it's actually MUCH more important that one knows where to look for information.
I'd suggest getting a learning perl or programming perl book, and take that formal approach. Eventually, you will become more familiar with perldoc and know where to look there, but I agree that it's not as good for the beginner. Perhaps someone else will suggest another resource.
thanks, i pasted the regexp on the script and i named it 1.pl and runt it but returns this: i added this : while (my $line = 's/\btext\b.*?\K\btext\b/changed/;' <$infh>) { and returns : syntax error at 1.pl line 11, near "$infh>" Global symbol "$line" requires explicit package name at 1.pl line 14. Execution of 1.pl aborted due to compilation errors.
i have a document call input and the script as 1.pl using the shell gave me that result, can you help me please?
|
0

Although this question is quite old, but still very relevant. See the "alternates discarded" section below for context.


Running perl regular expressions from file:

Now, for the task at hand:

  1. Syntax:
    The syntax for replacement is s/.../.../gm where s stands for substitute, gm are global, multi-line flags. The search pattern regex i.e. (\A.*?tex.*?)text(.*?)$ goes at first blank, the replacement goes in second, thus giving:

    s/(\A.*?tex.*?)text(.*?)$/$1changed$2/gm
    
  2. Test if it works:
    The -p here is to run the -e expression in a loop.

    echo -e 'text text text\ntext text text' |
    perl -pe 's/(\A.*?tex.*?)text(.*?)$/$1changed$2/gm'
    

    Yes it does 😃.

  3. Use files:
    Save the patten in a file ending in .pl. Let's say script.pl. The input file goes at end. So, the CLI command becomes:

    perl -p script.pl input.txt
    
  4. Make it scalable:
    Let's make the file a bit manageable & suitable for scaling up. The following whole section is dedicated to it.


Script file cleanup

  1. Currently, it should be looking like this:

    # script.pl
    s/(\A.*?tex.*?)text(.*?)$/$1changed$2/gm
    
  2. Store the search pattern in a variable inside qr/.../. The replacement one can't be saved easily. Each line to terminate with semi-colon ;.

  3. Add comments starting with # to help yourself to recall the purpose of the immediate next line of code.

    # Pattern definition
    my $pattern = qr/(\A.*?tex.*?)text(.*?)$/;
    
    # Call to substitute command
    s/$pattern/$1changed$2/gm;
    
  4. Add these lines to the top of the file to turn ON some warnings.

    use strict;
    use warning;
    
  5. Enclose the command calling code in a function/subroutine using sub { ... }, & call that with $...->(). This will encapsulate the code; ensuring readability, scalability, as well as ease of use.

    my $substitute_fn = sub {
        s/$pattern/$1changed$2/gm;
        # ... Other lines in future
    };
    
    $substitute_fn->();
    

    Re-visit step 2 after this, and continue to step 5 after coming back.

  6. The CLI invokation so far has been perl -p script.pl input.txt. To be able to directly run the file automatically with the boilerplate, use the shebang #! construct at first line of the file.

    The #!/usr/bin/perl won't work due to multiple CLI args, hence use env -S to split the string.

  7. Add -i.bak in shebang to "edit files in place (& make backup)". The shebang line becomes:

    #!/usr/bin/env -vS perl -i.bak -p
    
  8. Allow the script file to be executable with chmod -v u+x script.pl, and run that on CLI with:

    ./script.pl input.txt
    

The script file in its full glory should now look like this:

#!/usr/bin/env -vS perl -i.bak -p

use strict;
use warning;


# Pattern to match string belonging to XYZ field or of type ABC
my $pattern = qr/(\A.*?tex.*?)text(.*?)$/;


# Replace uncaptured part in $pattern (defined above) with "changed"
my $substitute_fn = sub {
    s/$pattern/$1changed$2/gm;
    # ... Other lines in this subroutine
};


# ... Other subroutines' definition


$substitute_fn->();
# ... Call to other subroutines

Alternates discarded

  1. explicit boilerplate in perl (loop over lines, or file handling etc)
    No thanks. I use perl for elegant regex substitutions, and I'm not interested in needless side quests. I don't know if my this solution was possible or not in the perls of that time. My perl is v5.38.2.

    Ref:

  2. non-dedicated or non-FOSS GUI tools:

    Besides the given corresponding reasons, the following tools are also non-reproducible, non-sharable, prone-to-rot, hard-to-document GUI tools involving multiple different widgets.

    • regex101.com: awesome, i've used it alot. but i don't want to stay dependent on a 3rd party online tool for smth i require frequently.
    • vscodium editor v1.93.1:
      • the in-editor widget doesn't show live preview
      • the search side panel doesn't support maximising the view
      • the search editor doesn't support replacement mode
    • jetbrains ide suits:
      non FOSS, heavy, and plain overkill tool if used just for this

1 Comment

looking back at it, other than the "alternates discarded" part, it looks sooo much like a chatbot's generated answer. earlier, i was of the opinion that one can easily tell those apart. but now that even my own hand written answer over 2 days feels chatbot generated to me myself, i stand corrected.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.