This can be done using a replacement callback.
In Perl, this is typically accomplished by using the s///e evaluation form.
This just gets the common row block in capture buffers.
Buffer 1 is the first row, buffer 3 is the remaining common row's.
These are passed to the merge sub.
The merge sub trims out the common row's via another regex,
then combines the first row with the common row's.
It then gets passed back as a replacement.
Perl code:
use strict;
use warnings;
$/ = undef;
my $input = <DATA>;
sub mergeRows {
my ($first_row, $other_rows) = @_;
$other_rows =~ s/(?m)\s*^\w+\s*(.*)(?<!\s)\s*/$1 /g;
return $first_row . " " . $other_rows . "\n";
}
$input =~ s/(?m)(^(\w+).*)(?<!\s)\s+((?:\s*^\2.*)+)/ mergeRows($1,$3) /eg;
print $input, "\n";
__DATA__
row1 multiline 1
row1 multiline 2
row1 multiline 3
row2 multiline 1
row2 multiline 2
Output:
row1 multiline 1 multiline 2 multiline 3
row2 multiline 1 multiline 2
Main regex:
(?m) # Multi-line mode
( # (1 start), First of common row
^
( \w+ ) # (2), common row label
.*
) # (1 end)
(?<! \s ) # Force trim of trailing spaces
\s+ # Consume a newline, also get all the next whitespaces
( # (3 start), Remaining common row's
(?:
\s* ^ \2 .*
)+
) # (3 end)
Merge sub regex:
(?m) # Multi-line mode
\s* # remove
^ \w+ \s* # remove
( .* ) # (1), What will be saved
(?<! \s ) # remove, force trim of trailing spaces
\s* # remove, possibly many newlines (whitespace)