Perl grep not returning expected value

Question

I have the following code:

#!/usr/bin/perl
# splits.pl

use strict;
use warnings;
use diagnostics;

my $pivotfile = "myPath/Internal_Splits_Pivot.txt";

open PIVOTFILE, $pivotfile or die $!;

while (<PIVOTFILE>) { # loop through each line in file

    next if ($. == 1); # skip first line (contains business segment code)
    next if ($. == 2); # skip second line (contains transaction amount text)

    my @fields = split('\t',$_);  # split fields for line into an array     

    print scalar(grep $_, @fields), "\n"; 

}

Given that the data in the text file is this:

    4   G   I   M   N   U   X
    Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount
0000-13-I21             600         
0001-8V-034BLA              2,172   2,172       
0001-8V-191GYG                  13,125      4,375
0001-9W-GH5B2A  -2,967.09       2,967.09    25.00

I would expect the output from the perl script to be: 2 3 3 4 given the amount of defined elements in each line. The file is a tab delimited text file with 8 columns.

Instead I get 3 4 3 4 and I have no idea why!

For background, I am using Counting array elements in Perl as the basis for my development, as I am trying to count the number of elements in the line to know if I need to skip that line or not.

ysth · Accepted Answer · 2012-11-20 19:33:40Z

2

I suspect you have spaces mixed with the tabs in some places, and your grep test will consider " " true.

What does:

use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper [<PIVOTFILE>];

show?

answered Nov 20, 2012 at 19:33

ysth

99.1k6 gold badges126 silver badges220 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Scott Holtzman Over a year ago

I 2nd that +1. It's been a while since I've written in perl and I forgot about this valuable resource.

memowe · Accepted Answer · 2012-11-20 19:37:09Z

2

The problem should be in this line:

my @fields = split('\t',$_);  # split fields for line into an array

The tab character doesn't get interpolated. And your file doesn't seem to be tab-only separated, at least here on SO. I changed the split regex to match arbitrary whitespace, ran the code on my machine and got the "right" result:

my @fields = split(/\s+/,$_);  # split fields for line into an array

Result:

edited Nov 20, 2012 at 19:37

answered Nov 20, 2012 at 19:27

memowe

2,66818 silver badges26 bronze badges

5 Comments

Scott Holtzman Over a year ago

thanks for the help, but no dice. I made the change, still got same results.

ysth Over a year ago

with '\t', since '' don't interpret backslashes except for `\` and \', the string passed to the regex complier is actually a literal backslash and t, but the regex compiler itself handles backslashes and properly generates a tab regex. But you are correct that /\t/ is much better form.

ysth Over a year ago

there do indeed have to be tabs in the original data or the results reported wouldn't occur, but I suspect /\s+/ will indeed fix the real "problem" (though it won't preserve correct information as to which data is in which tab-separated column)

memowe Over a year ago

@ysth the OP didn't provide any information about the structure his program tries to understand and his program just counts the splitted (true) values.

Scott Holtzman Over a year ago

Thanks. The answer above got me there first, so I gave him the answer mark, however, I appreciate your quick reply and +1 for edit that solved the problem as well.

Community · Accepted Answer · 2017-05-23 11:56:11Z

As a side note:

For background, I am using Counting array elements in Perl as the basis for my development, as I am trying to count the number of elements in the line to know if I need to skip that line or not.

Now I understand why you use grep to count array elements. That's important when your array contains undefined values like here:

my @a;
$a[1] = 42;      # @a contains the list (undef, 42)
say scalar @a;   # 2

or when you manually deleted entries:

my @a = split /,/ => 'foo,bar';    # @a contains the list ('foo', 'bar')
delete $a[0];                      # @a contains the list (undef, 'bar')
say scalar @a;                     # 2

But in many cases, especially when you're using arrays to just store list without operating on single array elements, scalar @a works perfectly fine.

my @a = (1 .. 17, 1 .. 25);        # (1, 2, ..., 17, 1, 2, .., 25)
say scalar @a;                     # 42

It's important to understand, what grep does! In your case

print scalar(grep $_, @fields), "\n";

grep returns the list of true values of @fields and then you print how many you have. But sometimes this isn't what you want/expect:

my @things = (17, 42, 'foo', '', 0);  # even '' and 0 are things
say scalar grep $_ => @things         # 3!

Because the empty string and the number 0 are false values in Perl, they won't get counted with that idiom. So if you want to know how long an array is, just use

say scalar @array; # number of array entries

If you want to count true values, use this

say scalar grep $_ => @array; # number of true values

But if you want to count defined values, use this

say scalar grep defined($_) => @array; # number of defined values

I'm pretty sure you already know this from the other answers on the linked page. In hashes, the situation is a little bit more complex because setting something to undef is not the same as deleteing it:

my %h = (a => 0, b => 42, c => 17, d => 666);
$h{c} = undef;   # still there, but undefined
delete $h{d};    # BAM! $h{d} is gone!

What happens when we try to count values?

say scalar grep $_ => values %h;   # 1

because 42 is the only true value in %h.

say scalar grep defined $_ => values %h;   # 2

because 0 is defined although it's false.

say scalar grep exists $h{$_} => qw(a b c d);   # 3

because undefined values can exist. Conclusion:

know what you're doing instead of copy'n'pasting code snippets :)

+1 for taking the effort to write out this incredible explanation.

Borodin · Accepted Answer · 2012-11-20 23:13:37Z

2

There are not only tabs, but there are spaces as well.

trying out with splitting by space works Look below

#!/usr/bin/perl
# splits.pl

use strict;
use warnings;
use diagnostics;



while (<DATA>) { # loop through each line in file

    next if ($. == 1); # skip first line (contains business segment code)
    next if ($. == 2); # skip second line (contains transaction amount text)


    my @fields = split(" ",$_);  # split fields by SPACE     

    print scalar(@fields), "\n"; 

}

__DATA__
    4   G   I   M   N   U   X
    Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount
0000-13-I21             600         
0001-8V-034BLA              2,172   2,172       
0001-8V-191GYG                  13,125      4,375
0001-9W-GH5B2A  -2,967.09       2,967.09    25.00

Output

edited Nov 20, 2012 at 23:13

Borodin

127k9 gold badges72 silver badges146 bronze badges

answered Nov 20, 2012 at 19:35

Amey

8,5489 gold badges47 silver badges64 bronze badges

3 Comments

ysth Over a year ago

there do indeed have to be tabs in the original data or the results reported wouldn't occur.

Scott Holtzman Over a year ago

+1 Thanks! Since this was the first post that got to the crux of my problem, I am marking it as the answer. The issue was the extra space and changing the split to divide by spaces worked!

Borodin Over a year ago

split(" ",$_) is best written as split

jpalecek · Accepted Answer · 2012-11-20 19:36:53Z

1

Your code works for me. The problem may be that the input file contains some "hidden" whitespace fields (eg. other whitespace than tabs). For instance

A<tab><space><CR> gives two fields, A and <space><CR>
A<tab>B<tab><CR> gives three, A, B, <CR> (remember, the end of line is part of the input!)

I suggest you to chomp every line you use; other than that, you will have to clean the array from whitespace-only fields. Eg.

scalar(grep /\S/, @fields)

should do it.

answered Nov 20, 2012 at 19:36

jpalecek

47.9k7 gold badges105 silver badges148 bronze badges

2 Comments

Scott Holtzman Over a year ago

+1 for helping me remember to chomp! and also for providing the answer :)

Scott Holtzman Over a year ago

changing my answer because ultimately it was this that helped me the most! :) I posted an answer below that got the solution working for me.

Scott Holtzman · Accepted Answer · 2012-11-20 22:48:21Z

0

A lot of great help on this question, and quickly too!

After a long, drawn-out learning process, this is what I came up with that worked quite well, with intended results.

#!/usr/bin/perl
# splits.pl

use strict;
use warnings;
use diagnostics;

my $pivotfile = "myPath/Internal_Splits_Pivot.txt";

open PIVOTFILE, $pivotfile or die $!;

while (<PIVOTFILE>) { # loop through each line in file

    next if ($. == 1); # skip first line (contains business segment code)
    next if ($. == 2); # skip second line (contains transaction amount text)

    chomp $_; # clean line of trailing \n and white space

    my @fields = split(/\t/,$_);  # split fields for line into an array     

    print scalar(grep $_, @fields), "\n"; 

}

answered Nov 20, 2012 at 22:48

Scott Holtzman

27.3k5 gold badges42 silver badges76 bronze badges

2 Comments

memowe Over a year ago

So you still want to count true values only? :)

Scott Holtzman Over a year ago

@memowe -> yes, I just want true values in the logic I am working with. I do, however, appreciate you laying out that excellent explanation. It was very clear, and helped me understand all the concepts a bit more than the original link I was working with. The generosity of SO users never ceases to amaze me!

Collectives™ on Stack Overflow

Perl grep not returning expected value

6 Answers 6

1 Comment

5 Comments

1 Comment

3 Comments

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

5 Comments

1 Comment

3 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related