How would I match variable multiline perl regex with distinct rules

Question

The parser api (which I am not allowed to modify) gives me a string of this form:

    var1    var2  \
var2continued var2continued   \\\
var2continued
var3
var3continued \

var3continued

I want to split this string using regex such that:

$1 = "var1";
$2 = "var2  
var2continued var2continued   \\
var2continued"
$3 = "var3
var3continued \

var3continued"

Basically first variable is first non-space word after 1 or more spaces and end when space is encountered.

Second variable starts from first non-space character after first variable until line end. If last character is "\", add the next line to the second variable (don't trim white space between last character on cur line and "\"). "\" should not capture next line but returns both "\" (no escape). Only trim white space for last line.

Third variable is everything after second variable.

So far I've been able to come up with this regex which only works with one line for var2 and var3

$my_re = qr/\s+(\S+)\s+(\S+)\s+[\n](.*)/

$text =~ /$my_re/

Given the comments under answers, the description in the question omits important facts. (It is also a little hard to follow.) I suggest to be extra careful in formulating questions, as this text is the only thing we have to go with. Please recall that people who read this generally have absolutely no clue about your problem. — zdim
– zdim, Commented Jan 8, 2020 at 3:50

zdim · Accepted Answer · 2020-01-07 07:37:56Z

3

First word, then everything up to a newline immediately preceded by a non-slash; then all else

/\s+ (\S+) \s+ (.*?[^\\]) \n (.*)/xs;

The /s modifier makes it so that the . matches newline as well, critical here (normally it doesn't). The /x modifier makes it ignore literal spaces so we can make it more readable.

An example program

use warnings;
use strict;
use feature 'say';

my $v = 
q(    var1    var2  \
var2continued var2continued   \\\
var2continued
var3
var3continued \

var3continued);

$v =~ /\s+ (\S+) \s+ (.*?[^\\]) \n (.*)/xs;

say "\"$1\"";  say '---';
say "\"$2\"";  say '---';
say "\"$3\"";

prints

"var1"
---
"var2  \
var2continued var2continued   \\
var2continued"
---
"var3
var3continued \

var3continued"

edited Jan 7, 2020 at 7:37

answered Jan 7, 2020 at 7:30

zdim

67.2k5 gold badges59 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Topa Over a year ago

Your backslash are escaping the next backslash. The \\\ is actually \\ when printed. My string is actually after escaping. Also I need to delete one backslash if odd for $2. I believe from googling that you can't skip characters in captured groups. So I had to loop through each line.

zdim Over a year ago

@Topa "backslash are escaping the next backslash" -- hum? Do you mean the [^\\] ...? Yes, it's escaping it so that we get a "not-backslash" character class. You can't simply place a backslash in a character class (like [\]) because it would actually escape the ], making it into a syntax error (because then the opening [ wouldn't get closed)

zdim Over a year ago

@Topa "My string is actually after escaping." -- heh ... your string is exactly what you showed in the question. People who read a question can't guess what you mean, and that "after escaping" thing isn't mentioned int he question (and I don't really know what you mean by that?)

zdim Over a year ago

@Topa "Also I need to delete one backslash if odd ..." -- again, you never mention that in the question. (The output you show indeed has two backslashes, and my answer reproduces that.)

zdim Over a year ago

@Topa "can't skip characters in captured groups" --- I don't understand what you mean by that. The regex in the answer works (and implements your description)

|

Polar Bear · Accepted Answer · 2020-01-07 07:49:55Z

1

Try following piece of code (my take at the problem)

use strict;
use warnings;

my $str = do { local $/; <DATA> };

print "INPUT:\n[$str]\n";

$str =~ /(\w+)\s+(.*?\\\\\\\s*\w+)\n(.+)/s;
#$str =~ /(\w+)\s+((?:.*?)\\\\\\\s+(?:\w+)?)\n(.+)/s;

print "\n1: [$1]";
print "\n2: [$2]";
print "\n3: [$3]";

__DATA__
    var1    var2  \
var2continued var2continued   \\\
var2continued
var3
var3continued \

var3continued

output

INPUT:
[    var1    var2  \
var2continued var2continued   \\\
var2continued
var3
var3continued \

var3continued
]

1: [var1]
2: [var2  \
var2continued var2continued   \\\
var2continued]
3: [var3
var3continued \

var3continued
]

edited Jan 7, 2020 at 7:49

answered Jan 7, 2020 at 7:35

Polar Bear

6,8061 gold badge8 silver badges13 bronze badges

1 Comment

Topa Over a year ago

It doesn't delete one backslash for odd number of backslashes. I believe it is undoable in one regex.

Topa · Accepted Answer · 2020-01-07 23:03:21Z

0

None of the answers worked for all cases (2 and 3 are optional). I had a small issue where the parser was adding space after the backslash.

I ended up splitting the text into array of lines. Then splitting it into two parts (1 and 2 together and 3 by itself). Then I split it the first part by itself. My actual code is split into multiple functions but I simplified below:

my $empty_re = qr/^\s*$/;
my $def_re = qr/(.*?)((?:\\{2})*)(\\?)\s*$/;
my $dual_token_re = qr/\s*(\S+)\s*(.*)/s;
$text= "place text here"
my @lines = split /\n/, $text;
my $i;
my $j;
my $def = "";
my $other;
# Get start capture
for($i=0;$i<=$#lines;$i++){
    last if !($lines[$i] =~ /$empty_re/);
}

# Start definition capture
for($j=$i;$j<=$#lines;$j++) {
    $lines[$j] =~ s/$def_re/$1$2/; # remove ending backquote if odd
    last if !$3; # break if even backquotes
}
$def = join "\n", @lines[$i..$j];
$j++;

# Get remaining text
if ($j <= $#lines) {
    $other = join "\n", (splice @lines, $j);
}

# $def has 1 and 2, $other has 3

$def =~ /$dual_token/
# now $1 and $2 has 1 and 2, $other has 3

edited Jan 7, 2020 at 23:03

answered Jan 7, 2020 at 22:39

Topa

1001 silver badge8 bronze badges

1 Comment

zdim Over a year ago

"... cases (2 and 3 are optional)" --- but the question doesn't give even a hint of that? On the contrary, it specifically makes statements about groups 2 and 3.

Collectives™ on Stack Overflow

How would I match variable multiline perl regex with distinct rules

3 Answers 3

7 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related