I'm matching multiple patterns in a string to populate an array. The input file looks like this:
I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins # 2.8
My father [père;parent;papa] lives in New-York # Mon père vit à New-York # 1.8
I use this code:
use strict;
use warnings;
use Data::Dump;
open(TEXT, "<", "$ARGV[0]")
or die "cannot open < $ARGV[0]: $!";
while(my $text = <TEXT>)
{
my @lines = split /\n/, $text;
foreach my $line (@lines) {
if ($line =~ /(^(.+)\t(.+)\t(.+)$)/){
my $english_sentence = $2;
my $french_sentence = $3;
my $score = $4;
print $english_sentence."#".$french_sentence."";
my @data = map [ split /;/ ], $line =~ / \[ ( [^\[\]]+ ) \] /xg;
dd \@data;
}
print "\n";
}
}
close TEXT;
Here is the output:
I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins
Array==>[["chats", "chaton", "chatterie"], ["lapins", "lapereau"]]
My father [père;parent;papa] lives in New-York # Mon père vit à New-York
Array==>[["père", "parent", "papa"]]
I need to delete the strings in the array when this string match with a part of the sentence. Finally, I'd like to have this results:
I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins
[["chats"], ["lapins"]]
My father [père;parent;papa] lives in New-York # Mon père vit à New-York
[["père"]]