Cannot get multi-line regex to match string

Question

I'm reading an HTML file, trying to get some information out of it. I've tried HTML parsers, but can't figure out how to use them to get key text out. The original reads the html file, but this version is a minimal working example for StackOverflow purposes.

#!/usr/bin/env perl

use 5.036;
use warnings FATAL => 'all';
use autodie ':default';
use Devel::Confess 'color';

sub regex_test ( $string, $regex ) {
    if ($string =~ m/$regex/s) {
        say "$string matches $regex";
    } else {
        say "$string doesn't match $regex";
    }
}
# the HTML text is $s
my $s = '      rs577952184 was merged into
      
        <a target="_blank"
           href="rs59222162">rs59222162</a>
      
';

regex_test ( $s, 'rs\d+ was merged into.*\<a target="_blank".+href="rs(\d+)/');

however, this doesn't match.

I think that the problem is the newline after "merged into" isn't matching.

How can I alter the above regex to match $s?

@Barmar but the original HTML, which contains the string, cannot be modified. I'm only trying to figure out how to change $regex — con
– con, Commented Nov 4, 2022 at 21:21

Dan Bonachea · Accepted Answer · 2022-11-04 21:28:29Z

2

The problem is the trailing / character in the $regex, which should either be omitted or changed to "

answered Nov 4, 2022 at 21:28

Dan Bonachea

2,5075 gold badges20 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Polar Bear · Accepted Answer · 2022-11-04 21:37:51Z

2

use strict;
use warnings;
use feature 'say';

my $s = '      rs577952184 was merged into
      
        <a target="_blank"
           href="rs59222162">rs59222162</a>
      
';

my $re = qr/rs\d+ was merged into\s+<a target="_blank"\s+href="rs(\d+)">rs\d+<\/a>/;

regex_test($s,$re);

exit 0;

sub regex_test {
    my $string = shift;
    my $regex  = shift;
    
    say $string =~ m/$regex/s 
        ? "$string matches $regex"
        : "$string doesn't match $regex";
}

Output

      rs577952184 was merged into

        <a target="_blank"
           href="rs59222162">rs59222162</a>

 matches (?^:rs\d+ was merged into\s+<a target="_blank"\s+href="rs(\d+)">rs\d+</a>)

edited Nov 4, 2022 at 21:37

answered Nov 4, 2022 at 21:32

Polar Bear

6,8061 gold badge8 silver badges13 bronze badges

Collectives™ on Stack Overflow

Cannot get multi-line regex to match string

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related