0

I have two files, I need to do comparison to find out the matching and non-matching data. I got two problems now:

Question 1: one of my hashes can only capture the 2nd row of the 'num', i tried to use

push @{hash1{name1}},$x1,$y1,$x2,$y2

but it is still returning the 2nd row of the 'num'.

File1 :

name    foo
num     111 222 333 444
name    jack
num     999 111 222 333
num     333 444 555 777

File2:

name    jack
num     999 111 222 333
num     333 444 555 777
name    foo
num     666 222 333 444

This is my code:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $input1=$ARGV[0];
my $input2=$ARGV[1];

my %hash1;
my %hash2;
my $name1;
my $name2;
my $x1;
my $x2;
my $y2;
my $y1;

open my $fh1,'<', $input1 or die "Cannot open file : $!\n";
while (<$fh1>)
{   
    chomp;
    if(/^name\s+(\S+)/)
    {   
        $name1 = $1; 
    }   
    if(/^num\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/)
    {   
        $x1 = $1; 
        $y1 = $2; 
        $x2 = $3; 
        $y2 = $4; 
    }
    $hash1{$name1}=[$x1,$y1,$x2,$y2];
}   
close $fh1;
print Dumper (\%hash1);

open my $fh2,'<', $input2 or die "Cannot open file : $!\n";
while (<$fh2>)
{   
    chomp;
    if(/^name\s+(\S+)/)
    {
        $name2 = $1; 
    }
    if(/^num\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/)
    {
        $x1 = $1; 
        $y1 = $2; 
        $x2 = $3;
        $y2 = $4;
    }

    $hash2{$name2}=[$x1,$y1,$x2,$y2];

}
close $fh2;
print Dumper (\%hash2);

My output:

$VAR1 = {
          'jack' => [
                      '333',
                      '444',
                      '555',
                      '777'
                    ],
          'foo' => [
                     '111',
                     '222',
                     '333',
                     '444'
                   ]
        };
$VAR1 = {
          'jack' => [
                      '333',
                      '444',
                      '555',
                      '777'
                    ],
          'foo' => [
                     '666',
                     '222',
                     '333',
                     '444'
                   ]
        };

My expected Output:

$VAR1 = {
          'jack' => [ 
                      '999',
                      '111',
                      '222',
                      '333',
                      '333',
                      '444',
                      '555',
                      '777'
                    ],
          'foo' => [
                     '111',
                     '222',
                     '333',
                     '444'
                   ]
        };
$VAR1 = {
          'jack' => [ 
                      '999',
                      '111',
                      '222',
                      '333',
                      '333',
                      '444',
                      '555',
                      '777'
                    ],
          'foo' => [
                     '666',
                     '222',
                     '333',
                     '444'
                   ]
        };

Question 2: I tried to use this foreach loop to do the matching of keys and values and print out in a table format. I tried this :

print "Name\tx1\tX1\tY1\tX2\tY2\n"
foreach my $k1(keys %hash1)
{
    foreach my  $k2 (keys %hash2)
    {
        if($hash1{$name1} == $hash2{$name2})
        {
            print "$name1,$x1,$y1,$x2,$y2"
        }
    }
}

but Im getting :

"my" variable %hash2 masks earlier declaration in same scope at script.pl line 67.
"my" variable %hash1 masks earlier declaration in same scope at script.pl line 69.
"my" variable $name1 masks earlier declaration in same scope at script.pl line 69.
"my" variable %hash2 masks earlier declaration in same statement at script.pl line 69.
"my" variable $name2 masks earlier declaration in same scope at script.pl line 69.
syntax error at script.pl line 65, near "$k1("
Execution of script.pl aborted due to compilation errors.

my desired output for matching :

Name     x1   y1   x2   y2
jack     999  111  222  333
         333  444  555  777
4
  • You're missing a semicolon in that last snippet. And use eq to compare strings, not ==. Commented Oct 18, 2018 at 4:09
  • thanks for the reply, I've added the semicolon and changed == to eq but still getting the errors. :( Commented Oct 18, 2018 at 4:56
  • The warning is telling you that you declared (all) variables at multiple places, and tells you on what lines the other declarations are. Find them and clean up. Extracting meaningful code units into subroutines is important, and helps a lot with variable naming as well. Commented Oct 18, 2018 at 6:34
  • Only now did I see that you tried push @{hash1{name1}},$x1,$y1,$x2,$y2 (I'm assuming with the $ in front of hash1). That should work and I can't readily see why it didn't. Another note: you write to $hashes also when the name is matched (so for the line with the name); at that time x and y variables carry values from the previous name. That would give wrong output. Commented Oct 19, 2018 at 6:22

1 Answer 1

1

The one direct error is that you assign to a hash element with $hash2{$name2}=[...], what overwrites whatever was at that key before. Thus your output shows for jake the second set of numbers only. You need to push to that arrayref. Some comments on the code are below.

Here is a rudimentary (but working) code. Please note and implement the omitted checks.

use warnings;
use strict;
use feature 'say';

my ($f1, $f2) = @ARGV;
die "Usage: $0 file1 file2\n"  if not $f1 or not $f2;

my $ds1 = read_file($f1);
my $ds2 = read_file($f2);

compare_data($ds1, $ds2);

sub compare_data {
    my ($ds1, $ds2) = @_;    
    # Add: check whether one has more keys; work with the longer one
    foreach my $k (sort keys %$ds1) {
        if (not exists $ds2->{$k}) {
            say "key $k does not exist in dataset 2";
            next;
        }   
        # Add tests: do both datasets have the same "ref" type here?
        # If those are arrayrefs, as expected, are they the same size?

        my @data = @{$ds1->{$k}};
        foreach my $i (0..$#data) {
            if ($data[$i] ne $ds2->{$k}->[$i]) {
                say "differ for $k: $data[$i] vs $ds2->{$k}->[$i]";
            }
        }   
    }
}

sub read_file {
    my ($file) = @_; 
    open my $fh, '<', $file or die "Can't open $file: $!";
    my (%data, $name);
    while (<$fh>) {
        my @fields = split;
        if ($fields[0] eq 'name') {
            $name = $fields[1];
            next;
        }
        elsif ($fields[0] eq 'num') {
            push @{$data{$name}}, @fields[1..$#fields];
        }
    }   
    return \%data;
}

I'm leaving it as an exercise to code the desired format of the printout. The above prints

differ for foo: 111 vs 666

Note comments in code to add tests. As you descend into data structures to compare them you need to check whether they carry the same type of data at each level (see ref) and whether they are of the same size (so you wouldn't try to read past the end of an array). Once you get this kind of work under your belt search for modules for this.

I use eq in comparison of data (in arrayrefs) since it's not stated firmly that they are numbers. But if they are, as it appears to be the case, change eq to == .

Doing a code review would take us too far, but here are a few remarks

  • When you catch yourself needing such long list of variables think "collections" and reconsider your choice of data structures for the problem. Note that in the example above I didn't need a single scalar variable for data (I used one for temporary storage of the name)

  • Picking strings apart with a regex is part and parcel of text analysis -- when suitable. Familiarize yourself with other approaches. At this point see split

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.