There are already several good discussions of regular expressions and empty lines on SO. I'll remove this question if it is a duplicate.
Can anyone explain why this script outputs 5 3 4 5 4 3 instead of 4 3 4 4 4 3? When I run it in the debugger $blank and $classyblank stay at "4" (which I assume is the correct value) until the just before the print statement.
my ( $blank, $nonblank, $non_nonblank,
$classyblank, $classyspace, $blanketyblank ) = 0 ;
while (<DATA>) {
$blank++ if /\p{IsBlank}/ ; # POSIXly blank - 4?
$nonblank++ if /^\P{IsBlank}$/ ; # POSIXly non-blank - 3
$non_nonblank++ if not /\S/ ; # perlishly not non-blank - 4
$classyblank++ if /[[:blank:]]/ ; # older(?) charclass blankness - 4?
$classyspace++ if /^[[:space:]]$/ ; # older(?) charclass whitespace - 4
$blanketyblank++ if /^$/ ; # perlishly *really empty* - 3
}
print join " ", $blank, $nonblank, $non_nonblank,
$classyblank, $classyspace, $blanketyblank , "\n" ;
__DATA__
line above only has a linefeed this one is not blank because: words
this line is followed by a line with white space (you may need to add it)
then another blank line following this one
THE END :-\
Is it something to do with the __DATA__ section or am I misunderstanding POSIX regular expressions?
ps:
As noted in comment on a timely post elsewhere, "really empty" (/^$/) can miss non-emptiness:
perl -E 'my $string = "\n" . "foo\n\n" ; say "empty" if $string =~ /^$/ ;'
perl -E 'my $string = "\n" . "bar\n\n" ; say "empty" if $string =~ /\A\z/ ;'
perl -E 'my $string = "\n" . "baz\n\n" ; say "empty" if $string =~ /\S/ ;'
if /\A\Z/andif /\A\z/... which are pretty consistent across different languages except python but that's OK.This is perl 5, version 22, subversion 0 (v5.22.0) built for amd64-freebsdmy $string = "\n", "foo\n\n"assigns a single newline to$string. The rest is thrown away because of the comma operator.$, which will match the end of a string or before the newline of it is the last character.\p{IsBlank},[[:blank:]]are simple character classes and you can check what they do from perldoc perlunipropsperlrecharclassby lining them up with well their known perl equivalents (such as/\S/) and/or related "idioms". I was getting results I couldn't explain: specifically how\v, \s' and\hinteract with\nand" ". I think I have it figured out now and will add a separate answer if one doesn't appear.