3

The code is getting messy somewhere in the loop! Please help me to solve it.

Details

Replicate and/or reassign most of the array elements in the multi-dimensional array, using reference elements.

  • File-1: List of array indices & the elements that needs to be maintained in the original array.

  • File-2: The original multi-dimensional array that needs to be rewritten with the above info. Except the elements from the above, the rest of all elements have to be reassigned.

  • File-3: Expected output (reassigned array elements)

Note: Addition to the array indices from file1, rest of all the indices will be replaced with the reference line. Reference line is usually present in the first line of the array.
In the modified array, the reference line is not needed.

File-1:

ID1    2    E1,E4
ID2    5    E6,E7,E9
ID3    1    E3

File-2:

ID1.txt

Ref K L M N O P A B C D
E1 S H G U S K R E K K
E2 S L G N O P A B C D
E3 S L G N O P A B C D
E4 U L G G O P A B C D
E5 U L M G O P A J C D
E6 U L M G O P A J C D
E7 U L M G O P A J C D
E8 U L M G O P A J C D
E9 S L M N O P A J C D
E10 S L M N O P A J C D
.
.
.

File-3: Expected output

new_ID1.txt

E1    K L G N O P A B C D
E2    K L M N O P A B C D
E3    K L M N O P A B C D
E4    K L G N O P A B C D
E5    K L M N O P A B C D
E6    K L M N O P A B C D
E7    K L M N O P A B C D
E8    K L M N O P A B C D
E9    K L M N O P A B C D
E10    K L M N O P A B C D
.
.
.

In the expected output, (new_ID1.txt), second index of the array for "E1" and "E4" is maintained from the original array. Everything else is replaced by the reference line in "E2,E3,E5...".

Code

#!/usr/bin/perl 

use strict;
use warnings;

my %HoHoA = ();

open(IN,"ids.txt");
my @ids = <IN>; chomp @ids; close IN;

open(IN2,"indices_and_values.txt");

while(my $l = <IN2>)
{
    chomp $l;
my @tmp = split "\t", $l;
my $lid = $tmp[0];
my $pos = $tmp[1];
my @gps = @tmp[2..$#tmp];

    foreach my $g (@gps)
    {
        push @{$HoHoA{$lid}{$g}}, $pos;
    }
}
close IN2;


foreach my $outer (sort keys %HoHoA)
{
open(IN3,"$outer.txt");
my @rS = <IN3>; chomp @rS; close IN3;

    my @orgArr = (); my @refArr = (); my @newArr = ();
    foreach my $unk (@rS) 
    { 
        @orgArr = split "\t", $unk;
        if($unk =~ /^Ref/)
        { 
            @refArr = split "\t", $unk;
            next;
        }
    foreach my $inner (sort keys %{$HoHoA{$outer}})
    {
        if($inner =~ /^$orgArr[0]/)
        {
            foreach my $ele (sort {$a <=> $b} @{$HoHoA{$outer}{$inner}})
            {
                $refArr[$ele] = $orgArr[$ele];
            }
        }
        #else
        #{
        #}
    }
    print ">$orgArr[0]\t";
    print join("\t",@refArr[1..$#refArr]);
    print "\n";
}
    @rS = ();
    print "\n";

}

3
  • I don't understand. It says that E1 is kept (at least the "second index"...) but E1 row in ID1.txt file is S H G U... while in the desired output file it is K L G N... --- completely different. What do you mean by "array E1" -- the row that starts with E1? Commented Oct 23, 2018 at 17:09
  • @zdim Only the given array indices from file 1 have to be kept NOT the entire elements. For instance, E1[2] should be maintained. E1[2] is "G". Every other element should be replaced by the Ref elements (the first line of the array). So it is K L G N O P A B C D. I (assumed) mentioned array E1, E2 and so on because of multiple lines (rows/columns). Would be helpful to know the other way too. File-1 format is as follows, "id index_value arrays" Commented Oct 23, 2018 at 17:29
  • Ah, thank you -- yes, it says all that in the question. Let me know if I still misunderstood something (see answer) Commented Oct 23, 2018 at 18:24

2 Answers 2

2

The shown code is well-meant but a bit too complicated; you may have lost your way in the maneuvers over the nested data structure. Here's another, simpler, approach.

Parse the information from the "reference" file (File-1) into a hash (E1 => [2, ...], ..). I put indices for data to be kept in an arrayref to allow for multiple indices for a row. Then go line by line, replacing data at these indices for rows that have a key, and print output as you go.

use warnings;
use strict;
use feature 'say';

my ($ref_file, $data_file) = @ARGV;
die "Usage: $0 ref-file data-file\n" if not $ref_file or not $data_file;

open my $fh, '<', $ref_file or die "Can't open $ref_file: $!";
my %rows;
while (<$fh>) {
    my (undef, $idx, $row_id) = split;
    for (split /,/, $row_id) {
        push @{$rows{$_}}, $idx;        # elem => [ indices ]
    }
}

my $outfile = 'new_' . $data_file;
open    $fh,     '<', $data_file  or die "Can't open $data_file: $!";
open my $fh_out, '>', $outfile    or die "Can't open $outfile: $!";

my @ref = split ' ', <$fh>;
shift @ref;                  # toss the first field

while (<$fh>) {
    my ($row_id, @data) = split;

    if (exists $rows{$row_id}) {              # this row needs attention
        my @new_row = @ref;
        foreach my $idx (@{$rows{$row_id}}) { # keep data at these indices
            $new_row[$idx] = $data[$idx];
        }
        say $fh_out join "\t", $row_id, @new_row;
    }
    else {                                    # use whole reference line
        say $fh_out join "\t", $row_id, @ref;
    }
}

The new file (shown with two spaces instead of the actual tabs, for readability)

E1  K  L  G  N  O  P  A  B  C  D
E2  K  L  M  N  O  P  A  B  C  D
E3  K  L  M  N  O  P  A  B  C  D
E4  K  L  G  N  O  P  A  B  C  D
E5  K  L  M  N  O  P  A  B  C  D
E6  K  L  M  N  O  P  A  B  C  D
E7  K  L  M  N  O  P  A  B  C  D
E8  K  L  M  N  O  P  A  B  C  D
E9  K  L  M  N  O  P  A  B  C  D
E10  K  L  M  N  O  P  A  B  C  D

Note that the given input file happens to have the same entries as the reference line to use in replacement at many indices of interest -- so we can't see those "changes" in the above output. (I tested by changing the input file so to be able to see.)

Sign up to request clarification or add additional context in comments.

11 Comments

unfortunately it gives the same input as output still!
@perlbeginner Hum? It prints the output as shown, which is your desired output from the question ... ? It uses the input file that you give in the question. I also tested for all fields that should be changed, as explained, and it correctly keeps data at those indices (while replacing all else by that reference line).
@perlbeginner I don't understand: copy-paste the program and run it with files that you show in the question (File-1 and ID1.txt) -- it has to print what is shown in the answer (I copy-pasted it from my console, when printed with two spaces instead of tabs). This is precisely the desired output.
I copy-paste and tried exactly the same. The output you showed here is the desired output. But when I run it, it gives the similar to the file2 (ID1.txt) not new_ID1.txt
@perlbeginner I just checked: I copy-pasted my code above into a new file, and ran script.pl File-1 ID1.txt. It produced new_ID1.txt with content as shown above. Are you sure you are checking the right file? Do you have other files around which may confuse matters?
|
1

This is one way to do it, if I understood your problem statement correctly:

#!/usr/bin/perl

use strict;
use warnings;

my %keep_idx;

open FILE, "file-1" or die "Couldn't open file-1";
while(<FILE>) {
    my (undef, $idx, $id_str) = split /\s+/;
    my @ids = split /,/, $id_str;
    foreach my $id (@ids) {
        $keep_idx{$id}{$idx} = 1;
    }
}
close FILE;

open FILE, "file-2" or die "Couldn't open file-2";
open OUTFILE, ">file-3" or die "Couldn't open file-3";
my (undef, @ref) = split /\s+/, <FILE>;
while(<FILE>) {
    my ($id, @src) = split /\s+/;
    my $line = "$id";
    for (my $i = 0; $i <= $#src; $i++) {
        my $e = $keep_idx{$id}{$i} ? $src[$i] : $ref[$i];
        $line .= " $e";
    }
    print OUTFILE "$line\n";
}
close OUTFILE;
close FILE;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.