PERL - work with txt files, and extracting the data in different variables

Question

I need to work with .txt files, and filter by name and date stored at the name of the file.

At the moment I achived the following:

my $dir = "t-files\/";
chdir($dir);
foreach $files (glob('*.txt')) {
  ($sname) = split(/_/, $files);
  #($sdate) = "still under work"
  print "\nSwitch Name: $sname - Date: still under work";
}

File example names: "s-ar-ar55g-1_20140911-09.txt" | "s-ar-ar55g-1_20141027-09.txt" | etc.

With this script I have the following output:

D:\_perl>test_01.pl

Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
D:\_perl>

My intention is to extract the date string "20140911" from the file, and stored into a new variable "sdate"

By this way I need to have two variables, so I be able to make comparition with name and date

Is it posible to extract the year, month and day like this "20140911", directly from the name of the txt file?

zdim · Accepted Answer · 2021-09-27 16:09:35Z

Can always parse a string like this with an easy regex

my $file = 's-ar-ar55g-1_20140911-09.txt';

my ($sname, $date) = $file =~ /( [^_]+ ) _ ( [0-9]{8} )/x;

The /x modifier makes it ignore spaces (and newlines, and honors comments with #) in patterns, so that we can make it more readable. As for patterns, I use negation (^) in the character class [] with [^_], which matches any character other than _, and the following + means that there must be at least one such character. So that matches a string of characters up to the first _.

This is captured, because of surrounding (), and so is the pattern for a number which must repeat 8 times, [0-9]{8}. The two captured patterns are returned, and assigned to $sname and $date. See tutorial perlretut for starters, or your favorite good Perl book.

Note that I declare my $sname, and all other variables as they get introduced. This can be enforced by strict pragma, and you must always enable warnings as well of course.

The split you use is a great tool to reach for, but there is a little more to do with it here

my ($sname, $date) = split /_/, $file;  
# Now need to remove the trailing `-1.txt` from $date
($date) = split /-/, $date, 2;
# or, with a regex
# $date =~ s/[^-]+\K.*//;  # remove the first - and all after it

That third argument in the second split, the 2, tells split to return two elements altogether. So that'll be what's before the first - and then a string with everything after it.

We need () around $date to enforce a list context otherwise it would impose a scalar context and would get assigned the number of elements of the returned list (2).

Clearly a bit more work and consideration than the basic regex usd first.

Another way, to push this argument further, would be to split on either _ or - and then assemble parts as needed

my @parts = split /[_-]/, $file;
my ($sname, $date) = ( join('-', @parts[0..3]), $parts[4] );

Now we also have that @parts variable floating around, supposedly unneeded, so let's avoid that namespace pollution

my ($sname, $date) = do {
    my @parts = split /[_-]/, $file;
    join('-', @parts[0..3]), $parts[4];
};

(Now @parts, being declared as lexical my inside that do block, does not exist outside of it.)

This is a standard way to work with a string when parts of it need analyzes and processing but it is clearly an overkill here, in comparison with that simple regex.

Hello zdim! Thanks so much, this is very handy for the work that I´m doing
@seltika Glad to hear it helps :) Let me know if more explanation, and/or references, would be useful

Polar Bear · Accepted Answer · 2021-09-27 07:38:59Z

0

Following code snippet utilizes regex to extract/capture from a filename 4 parts: anything before underscore, year (first 4 digits), month (next 2 digits), day of month (next 2 digits) -- for sanity check expects dash with following 2 digits, dot and txt as file's extension.

The output joins date parts with / for demonstration purpose only.

Note: replace while( <DATA> ) { with for ( glob('s-ar-*.txt') ) { to get a list of text files matching file mask in filesystem.

use strict;
use warnings;
use feature 'say';

while( <DATA> ) {
    /([^_]*)_(\d{4})(\d{2})(\d{2})-\d{2}\.txt/;
    my($switch,$year,$month,$mday) = ($1,$2,$3,$4);
    say "Switch name: $switch - Date: " . join('/',$year,$month,$mday);
}


__DATA__
s-ar-ar55g-1_20140911-09.txt
s-ar-ar55g-1_20141027-09.txt

Output

Switch name: s-ar-ar55g-1 - Date: 2014/09/11
Switch name: s-ar-ar55g-1 - Date: 2014/10/27

Reference: Perl regular expression

answered Sep 27, 2021 at 7:38

Polar Bear

6,8061 gold badge8 silver badges13 bronze badges

1 Comment

seltika Over a year ago

Hello Polar Bear! Thanks so much, your answer is also very handy!!

Collectives™ on Stack Overflow

PERL - work with txt files, and extracting the data in different variables

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related