0

I'm trying to parse a text file using perl regular expressions. Here's an example data set:

"Field1", "Field2", "Field3", "Field4", "Field5"
"val1-1", "\\path\to\val1-2.txt", "val1-3", "\\path\to\val1-4.ini", "val1-5.txt"
"val2-1", "val2-2", "\\path\to\val2-3.txt", "\\path\to\val2-4.ini", "val2-5.txt"
"\\path\to\val3-1.txt", "val3-2", "val3-3", "\\path\to\val3-4.ini", "val3-5.txt"

For each line of text, I'm trying to match the first instance of .txt file name; the bolded substrings in the above data set.

I thought this would work:

while(<INFILE>) {
    if(m/\\(.*?\.txt)"/) {
        print "$1\n";
    }
}

Output:

\path\to\val1-2.txt
\path\to\val2-3.txt
\path\to\val3-1.txt

but it doesn't because it will match the complete path, not just the filename.

Now this works:

while(<INFILE>) {
    if(my @matches = $_ =~ m/(.*?)"/g) {
        foreach (@matches) {
            print "$1\n" if(m/.*\\(.*?\.txt)/);
        }
    }
}

Output:

val1-2.txt
val2-3.txt
val3-1.txt

But I would suppose there must be a way to do this with a single match expression?

2 Answers 2

1

How about:

my $re = qr~\\([^\\"]+)"~;
while(<DATA>) {
    chomp;
    if(my @m = /$re/g) {
        say "@m";
    }
}

__DATA__
"Field1", "Field2", "Field3", "Field4", "Field5"
"val1-1", "\\path\to\val1-2.txt", "val1-3", "\\path\to\val1-4.ini", "val1-5.txt"
"val2-1", "val2-2", "\\path\to\val2-3.txt", "\\path\to\val2-4.ini", "val2-5.txt"
"\\path\to\val3-1.txt", "val3-2", "val3-3", "\\path\to\val3-4.ini", "val3-5.txt"

output:

val1-2.txt val1-4.ini
val2-3.txt val2-4.ini
val3-1.txt val3-4.ini

If you only want the first .txt, do:

my $re = qr~\\([^\\"]+\.txt)~;
while(<DATA>) {
    chomp;
    /$re/ && say $1;
}
Sign up to request clarification or add additional context in comments.

2 Comments

OP said he wants the first .txt, so no need for the .ini I guess. The quoted regex is a good idea though. :)
@simbabque is correct: no need for .ini. Here's how I translated your regex just in case this helps someone else also: [^\\"] ==> match anything that is NOT \ or ". + ==> match at least once. So [^\\"]+\.txt becomes: match any string separated by characters \ or " and whose length is at least one character and ends with .txt.
1

Try this one:

while (<DATA>) {
    if(m/([^\\]+\.txt)"/) {
        print "$1\n";
    }
}

__END__
val1-2.txt
val2-3.txt
val3-1.txt

You don't need the \ outside your capture group. Instead, look for everything that's not a backslash instead of just everything. Since you want the file to have a name in front of the .txt you want the + quantifier, not the *? which is match something or nothing but get as few as possible.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.