How do I read a file into an array in Perl but correctly handle duplicates?

Question

How do I read a file into an array, but correctly handle duplicates?

I have a file consisting of two columns, a name and a number. Eventually the names will repeat, with a number that may or may not be different.

Rita,13
Sue,11
Bob,01
Too,05
Rita,13
Sue,07
Bob,02
Too,05

I need to read these lines into an array which is not a problem, but repopulate them in a way so that any duplicate name has its value pushed to correct line, which is trickier. For me, at least.

So above should create something like

Rita,13,13
Sue,11,07
Bob,01,02
Too,05,05

There are about 3000 names, and about 600,000 lines to process. (The idea is to highlight which names are stable and which have changing values).

Speed does not matter too much. This will be run about once a week.

Because each line will end up with multiple entries, and the 2nd entry from each line does not matter too much (I need only to read it and populate it to new list), I am thinking I do not need to use a hash here, and I should just iterate through the input file with some form of if exists (or not). Am I right or would a hash be beneficial?

I am using Strawberry Perl V5.32.1 on Windows.

EDIT - thanks for samples, all worked great.

Due to a change in the output, the input file now has extra columns which must remain.

So now it looks like

12,Rita,1,4,13 
2,Sue,0,1,11 
5,Bob,12,5,01 
7,Too,1,4,05 
12,Rita,1,4,13 
2,Sue,0,1,07 
5,Bob,12,5,02 
7,Too,1,4,05

and output would be similar, in that only last column will change

12,Rita,1,4,13,13 
2,Sue,0,1,11,07 
5,Bob,12,5,01,02 
7,Too,1,4,05,05

The extra columns will not change but must be there. Does it still make sense to use an array and pluck the 2nd and 5th column to create it, or change the delimiter for first 4 columns so that first column becomes the unique key? That feels dirty but would work...

choroba · Accepted Answer · 2021-06-22 16:36:43Z

4

As a one-liner

perl -F, -lane '$h{ $F[0] } .= ",$F[1]"; END {print $_, $h{$_} for keys %h}' -- file

As a script:

#!/usr/bin/perl
use warnings;
use strict;

my %h;
while (<>) {
    chomp;
    my ($name, $number) = split /,/;
    $h{$name} .= ",$number";
}
for my $name (keys %h) {
    print $name, $h{$name}, "\n";
}

answered Jun 22, 2021 at 16:36

choroba

245k27 gold badges221 silver badges304 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Polar Bear · Accepted Answer · 2021-06-22 16:46:48Z

4

Use of hash with username as a key and id stored in an array make this task easy to implement.

Please investigate following code snippet.

use warnings;
use feature 'say';

my $data;

while( <DATA> ) {
    chomp;
    my($name,$id) = split /,/;
    push @{$data->{$name}}, $id;
}

say join(',',$_,@{$data->{$_}}) for sort keys %$data;

exit 0;

__DATA__
Rita,13
Sue,11
Bob,01
Too,05
Rita,13
Sue,07
Bob,02
Too,05

Output

Bob,01,02
Rita,13,13
Sue,11,07
Too,05,05

edited Jun 22, 2021 at 16:46

answered Jun 22, 2021 at 16:38

Polar Bear

6,8061 gold badge8 silver badges13 bronze badges

Comments

TLP · Accepted Answer · 2021-06-22 17:05:17Z

The basic solution to your problem has been given, but two things were overlooked: 1) the original order of the names cannot be preserved with a hash, 2) if any line has multiple values, the additional values are discarded. So I came up with this:

use strict;
use warnings;
use feature 'say';

my %d;
my @keys;
while (<DATA>) {
    chomp;
    my ($key, @val) = split /,/, $_;            # use array to store multiple values
    push @keys, $key if not defined $d{$key};   # preserve original order of lines
    push @{ $d{$key} }, @val;                   # store values
}

say join ",", $_, @{ $d{$_} } for @keys;        # print lines in original order

__DATA__
Rita,13
Sue,11
Bob,01,23
Too,05
Rita,13
Sue,07
Bob,02
Too,05

Output:

Rita,13,13
Sue,11,07
Bob,01,23,02
Too,05,05

Collectives™ on Stack Overflow

How do I read a file into an array in Perl but correctly handle duplicates?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related