1

Problem is to read a file with value at every new line. Content of file looks like

3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound

With the string before first comma being the Key and string in between last and second last comma the Value

i.e. for the first line 3ssdwyeim3 becomes the key and 0.476 Value.

Now as we are looping over each line if the key exists we have to concatenate the values separated by comma.

Hence for the next new line as key already exists key remains 3ssdwyeim3 but the value is updated to 0.476,0.421.

Finally we have to print the keys and values in a file.

I have written a code to achieve the same, which is as follows.

sub findbreakdown {
    my ( $out ) = @_;

    my %timeLogger;

    open READ, "out.txt" or die "Cannot open out.txt for read :$!";

    open OUTBD, ">$out\_breakdown.csv" or die "Cannot open $out\_breakdown.csv for write :$!";

    while ( <READ> ) {

        if ( /(.*),.*,.*,.*,(.*),.*/ ) {

            $btxnId = $1;
            $time   = $2;

            if ( !$timeLogger{$btxnId} ) {
                $timeLogger{$btxnId} = $time;
            }
            else {
                $previousValue       = $timeLogger{$btxnId};
                $newValue            = join ",", $previousValue, $time;
                $timeLogger{$btxnId} = $newValue;
            }
        }

        foreach ( sort keys %timeLogger ) {
            print OUTBD "$_ ,$timeLogger{$_}\n";
        }
    }

    close OUTBD;
    close READ;
}

However Something is going wrong and its printing like this

3ssdwyeim3,0.476
3ssdwyeim3,0.476,0.421
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436

Whereas expected is:

3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436

2 Answers 2

3

Your program is behaving correctly, but you are printing the current state of the entire hash after you process each line.

Therefore you are printing hash keys before they have the complete set of values, and you have many duplicated lines.

If you move the foreach loop that prints to the end of your program (or simply use the debugger to inspect the variables) you will find that the final state of the hash is exactly what you expect.


Edit: I previously thought the problem was the below, but it's because I misread the sample data in your question.

This regular expression is not ideal:

if (/(.*),.*,.*,.*,(.*),.*/) {

The .* is greedy and will match as much as possible (including some content with commas). So if any line contains more than six comma-separated items, more than one item will be included in the first matching group. This may not be a problem in your actual data, but it's not an ideal way to write the code. The expression is more ambiguous than necessary.

It would be better written like this:

if (/^([^,]*),[^,]*,[^,]*,[^,]*,([^,]*),[^,]*$/) {

Which would only match lines with exactly six items.

Or consider using split on the input line, which would be a cleaner solution.

Sign up to request clarification or add additional context in comments.

1 Comment

The Regex provided yielded exactly same results and didn't worked
1

This is much simpler than you have made it. You can just split each line into fields and use push to add the value to the list corresponding to the key

I trust you can modify this to read from an external file instead of the DATA file handle?

use strict;
use warnings 'all';

my %data;

while ( <DATA> ) {
    my @fields = split /,/;
    push @{ $data{$fields[0]} }, $fields[-2];
}

for my $key ( sort keys %data ) {
    print join(',', $key, @{ $data{$key} }), "\n";
}

__DATA__
3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound

output

3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436

2 Comments

This worked for me. However I do have to remove strict.
@KaushikBose: You must never remove use strict. It is there to tell you about mistakes in your code, and you should fix those mistakes rather than remove the advice it's giving you. You probably just need to declare a variable at the appropriate scope.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.