3

I need to work with .txt files, and filter by name and date stored at the name of the file.

At the moment I achived the following:

my $dir = "t-files\/";
chdir($dir);
foreach $files (glob('*.txt')) {
  ($sname) = split(/_/, $files);
  #($sdate) = "still under work"
  print "\nSwitch Name: $sname - Date: still under work";
}

File example names: "s-ar-ar55g-1_20140911-09.txt" | "s-ar-ar55g-1_20141027-09.txt" | etc.

With this script I have the following output:

D:\_perl>test_01.pl

Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
D:\_perl>

My intention is to extract the date string "20140911" from the file, and stored into a new variable "sdate"

By this way I need to have two variables, so I be able to make comparition with name and date

Is it posible to extract the year, month and day like this "20140911", directly from the name of the txt file?

0

2 Answers 2

5

Can always parse a string like this with an easy regex

my $file = 's-ar-ar55g-1_20140911-09.txt';

my ($sname, $date) = $file =~ /( [^_]+ ) _ ( [0-9]{8} )/x;

The /x modifier makes it ignore spaces (and newlines, and honors comments with #) in patterns, so that we can make it more readable. As for patterns, I use negation (^) in the character class [] with [^_], which matches any character other than _, and the following + means that there must be at least one such character. So that matches a string of characters up to the first _.

This is captured, because of surrounding (), and so is the pattern for a number which must repeat 8 times, [0-9]{8}. The two captured patterns are returned, and assigned to $sname and $date. See tutorial perlretut for starters, or your favorite good Perl book.

Note that I declare my $sname, and all other variables as they get introduced. This can be enforced by strict pragma, and you must always enable warnings as well of course.


The split you use is a great tool to reach for, but there is a little more to do with it here

my ($sname, $date) = split /_/, $file;  
# Now need to remove the trailing `-1.txt` from $date
($date) = split /-/, $date, 2;
# or, with a regex
# $date =~ s/[^-]+\K.*//;  # remove the first - and all after it

That third argument in the second split, the 2, tells split to return two elements altogether. So that'll be what's before the first - and then a string with everything after it.

We need () around $date to enforce a list context otherwise it would impose a scalar context and would get assigned the number of elements of the returned list (2).

Clearly a bit more work and consideration than the basic regex usd first.

Another way, to push this argument further, would be to split on either _ or - and then assemble parts as needed

my @parts = split /[_-]/, $file;
my ($sname, $date) = ( join('-', @parts[0..3]), $parts[4] );

Now we also have that @parts variable floating around, supposedly unneeded, so let's avoid that namespace pollution

my ($sname, $date) = do {
    my @parts = split /[_-]/, $file;
    join('-', @parts[0..3]), $parts[4];
};

(Now @parts, being declared as lexical my inside that do block, does not exist outside of it.)

This is a standard way to work with a string when parts of it need analyzes and processing but it is clearly an overkill here, in comparison with that simple regex.

Sign up to request clarification or add additional context in comments.

2 Comments

Hello zdim! Thanks so much, this is very handy for the work that I´m doing
@seltika Glad to hear it helps :) Let me know if more explanation, and/or references, would be useful
0

Following code snippet utilizes regex to extract/capture from a filename 4 parts: anything before underscore, year (first 4 digits), month (next 2 digits), day of month (next 2 digits) -- for sanity check expects dash with following 2 digits, dot and txt as file's extension.

The output joins date parts with / for demonstration purpose only.

Note: replace while( <DATA> ) { with for ( glob('s-ar-*.txt') ) { to get a list of text files matching file mask in filesystem.

use strict;
use warnings;
use feature 'say';

while( <DATA> ) {
    /([^_]*)_(\d{4})(\d{2})(\d{2})-\d{2}\.txt/;
    my($switch,$year,$month,$mday) = ($1,$2,$3,$4);
    say "Switch name: $switch - Date: " . join('/',$year,$month,$mday);
}


__DATA__
s-ar-ar55g-1_20140911-09.txt
s-ar-ar55g-1_20141027-09.txt

Output

Switch name: s-ar-ar55g-1 - Date: 2014/09/11
Switch name: s-ar-ar55g-1 - Date: 2014/10/27

Reference: Perl regular expression

1 Comment

Hello Polar Bear! Thanks so much, your answer is also very handy!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.