1

I have the following code:

#!/usr/bin/perl
# splits.pl

use strict;
use warnings;
use diagnostics;

my $pivotfile = "myPath/Internal_Splits_Pivot.txt";

open PIVOTFILE, $pivotfile or die $!;

while (<PIVOTFILE>) { # loop through each line in file

    next if ($. == 1); # skip first line (contains business segment code)
    next if ($. == 2); # skip second line (contains transaction amount text)

    my @fields = split('\t',$_);  # split fields for line into an array     

    print scalar(grep $_, @fields), "\n"; 

}

Given that the data in the text file is this:

    4   G   I   M   N   U   X
    Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount
0000-13-I21             600         
0001-8V-034BLA              2,172   2,172       
0001-8V-191GYG                  13,125      4,375
0001-9W-GH5B2A  -2,967.09       2,967.09    25.00           

I would expect the output from the perl script to be: 2 3 3 4 given the amount of defined elements in each line. The file is a tab delimited text file with 8 columns.

Instead I get 3 4 3 4 and I have no idea why!

For background, I am using Counting array elements in Perl as the basis for my development, as I am trying to count the number of elements in the line to know if I need to skip that line or not.

6 Answers 6

2

I suspect you have spaces mixed with the tabs in some places, and your grep test will consider " " true.

What does:

use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper [<PIVOTFILE>];

show?

Sign up to request clarification or add additional context in comments.

1 Comment

I 2nd that +1. It's been a while since I've written in perl and I forgot about this valuable resource.
2

The problem should be in this line:

my @fields = split('\t',$_);  # split fields for line into an array

The tab character doesn't get interpolated. And your file doesn't seem to be tab-only separated, at least here on SO. I changed the split regex to match arbitrary whitespace, ran the code on my machine and got the "right" result:

my @fields = split(/\s+/,$_);  # split fields for line into an array

Result:

2
3
3
4

5 Comments

thanks for the help, but no dice. I made the change, still got same results.
with '\t', since '' don't interpret backslashes except for `\` and \', the string passed to the regex complier is actually a literal backslash and t, but the regex compiler itself handles backslashes and properly generates a tab regex. But you are correct that /\t/ is much better form.
there do indeed have to be tabs in the original data or the results reported wouldn't occur, but I suspect /\s+/ will indeed fix the real "problem" (though it won't preserve correct information as to which data is in which tab-separated column)
@ysth the OP didn't provide any information about the structure his program tries to understand and his program just counts the splitted (true) values.
Thanks. The answer above got me there first, so I gave him the answer mark, however, I appreciate your quick reply and +1 for edit that solved the problem as well.
2

As a side note:

For background, I am using Counting array elements in Perl as the basis for my development, as I am trying to count the number of elements in the line to know if I need to skip that line or not.

Now I understand why you use grep to count array elements. That's important when your array contains undefined values like here:

my @a;
$a[1] = 42;      # @a contains the list (undef, 42)
say scalar @a;   # 2

or when you manually deleted entries:

my @a = split /,/ => 'foo,bar';    # @a contains the list ('foo', 'bar')
delete $a[0];                      # @a contains the list (undef, 'bar')
say scalar @a;                     # 2

But in many cases, especially when you're using arrays to just store list without operating on single array elements, scalar @a works perfectly fine.

my @a = (1 .. 17, 1 .. 25);        # (1, 2, ..., 17, 1, 2, .., 25)
say scalar @a;                     # 42

It's important to understand, what grep does! In your case

print scalar(grep $_, @fields), "\n";

grep returns the list of true values of @fields and then you print how many you have. But sometimes this isn't what you want/expect:

my @things = (17, 42, 'foo', '', 0);  # even '' and 0 are things
say scalar grep $_ => @things         # 3!

Because the empty string and the number 0 are false values in Perl, they won't get counted with that idiom. So if you want to know how long an array is, just use

say scalar @array; # number of array entries

If you want to count true values, use this

say scalar grep $_ => @array; # number of true values

But if you want to count defined values, use this

say scalar grep defined($_) => @array; # number of defined values

I'm pretty sure you already know this from the other answers on the linked page. In hashes, the situation is a little bit more complex because setting something to undef is not the same as deleteing it:

my %h = (a => 0, b => 42, c => 17, d => 666);
$h{c} = undef;   # still there, but undefined
delete $h{d};    # BAM! $h{d} is gone!

What happens when we try to count values?

say scalar grep $_ => values %h;   # 1

because 42 is the only true value in %h.

say scalar grep defined $_ => values %h;   # 2

because 0 is defined although it's false.

say scalar grep exists $h{$_} => qw(a b c d);   # 3

because undefined values can exist. Conclusion:

know what you're doing instead of copy'n'pasting code snippets :)

1 Comment

+1 for taking the effort to write out this incredible explanation.
2

There are not only tabs, but there are spaces as well.

trying out with splitting by space works Look below

#!/usr/bin/perl
# splits.pl

use strict;
use warnings;
use diagnostics;



while (<DATA>) { # loop through each line in file

    next if ($. == 1); # skip first line (contains business segment code)
    next if ($. == 2); # skip second line (contains transaction amount text)


    my @fields = split(" ",$_);  # split fields by SPACE     

    print scalar(@fields), "\n"; 

}

__DATA__
    4   G   I   M   N   U   X
    Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount  Transaction Amount
0000-13-I21             600         
0001-8V-034BLA              2,172   2,172       
0001-8V-191GYG                  13,125      4,375
0001-9W-GH5B2A  -2,967.09       2,967.09    25.00 

Output

2
3
3
4

3 Comments

there do indeed have to be tabs in the original data or the results reported wouldn't occur.
+1 Thanks! Since this was the first post that got to the crux of my problem, I am marking it as the answer. The issue was the extra space and changing the split to divide by spaces worked!
split(" ",$_) is best written as split
1

Your code works for me. The problem may be that the input file contains some "hidden" whitespace fields (eg. other whitespace than tabs). For instance

  • A<tab><space><CR> gives two fields, A and <space><CR>
  • A<tab>B<tab><CR> gives three, A, B, <CR> (remember, the end of line is part of the input!)

I suggest you to chomp every line you use; other than that, you will have to clean the array from whitespace-only fields. Eg.

scalar(grep /\S/, @fields)

should do it.

2 Comments

+1 for helping me remember to chomp! and also for providing the answer :)
changing my answer because ultimately it was this that helped me the most! :) I posted an answer below that got the solution working for me.
0

A lot of great help on this question, and quickly too!

After a long, drawn-out learning process, this is what I came up with that worked quite well, with intended results.

#!/usr/bin/perl
# splits.pl

use strict;
use warnings;
use diagnostics;

my $pivotfile = "myPath/Internal_Splits_Pivot.txt";

open PIVOTFILE, $pivotfile or die $!;

while (<PIVOTFILE>) { # loop through each line in file

    next if ($. == 1); # skip first line (contains business segment code)
    next if ($. == 2); # skip second line (contains transaction amount text)

    chomp $_; # clean line of trailing \n and white space

    my @fields = split(/\t/,$_);  # split fields for line into an array     

    print scalar(grep $_, @fields), "\n"; 

}

2 Comments

So you still want to count true values only? :)
@memowe -> yes, I just want true values in the logic I am working with. I do, however, appreciate you laying out that excellent explanation. It was very clear, and helped me understand all the concepts a bit more than the original link I was working with. The generosity of SO users never ceases to amaze me!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.