0

I have an AoA construct with four columns and many rows. Following is an example of data (input).

DQ556929    103480190   103480214   154943
DQ540839    103325247   103325275   2484
DQ566549    103322763   103322792   99
DQ699634    103322664   103322694   0
DQ544472    103322664   103322692   373
DQ709105    103322291   103322318   46
DQ705937    103322245   103322273   486
DQ699398    103321759   103321788   1211
DQ710151    103320548   103320577   692251
DQ548430    102628297   102628326   1
DQ558403    102628296   102628321   855795
DQ692476    101772501   101772529   481463
DQ544274    101291038   101291068   484047
DQ723982    100806991   100807020   1
DQ709023    100806990   100807020   3
DQ712307    100806987   100807014   0
DQ709654    100806987   100807012   571051
DQ707370    100235936   100235962   1481849

I want to group and write into a file all the row elements (sequentially). Conditions are if column four values less than 1000 and minimum two values are next to each other, group them else if the value less than 1000 and lies between the values more than 1000 treat them as single and append separately in the same file and the values which are more than 1000 also write as a block but with out affecting the order of the 2nd and third column.

This file is output of my previous program, now for this I have tried implementing my hands but getting some weird results. Here is my chunk of code, but non functional. Guys I need just help if i am executing my logic well here, I am open for any comments as a beginner. And also correct me anywhere.

my @dataf= sort{ $a->[1]<=> $b->[1]} @data;
@dataf=reverse @dataf;
for(my $i>=0;$i<=$#Start;$i++)
{
    print "$sortStart[$i]\n";
    my $diff = $sortStart[$i] - $sortStart[$i+1];
    $dataf[$i][3]= $diff;
#   $IDdiff{$ID[$i]}=$diff;
}

#print Dumper(@dataf);

open (CLUST, ">> ./clustTest.txt" );
for (my $k=0;$k<=$#Start;$k++)
{   

    for (my $l=0;$l<=3;$l++)
    {
#       my $tempdataf = shift $dataf[$k][$l];
#       print $tempdataf;       

        if ($dataf[$k][3]<=1000)
        {
            $flag = 1;
            do
            {
                print CLUST"----- Cluster $clustNo -----\n";
                print CLUST"$dataf[$k][$l]\t";
                if ($dataf[$k][3]<=1000)
                {
                    $flag1 = 1;
                }else {$flag1=0;}

            $clustNo++;
            }until($flag1==0 && $data[$k][3] > 1000);

            if($flag1==0 && $data[$k][3] > 1000)
            {
                print CLUST"Singlet \n";
                print CLUST"$dataf[$k][$l]\t";
                next;
            }
        #print CLUST"$dataf[$k][$l]\t";     #@IDdiff

        }

    print CLUST"\n";
    }
}

Expected output in file:

Singlets DQ556929 103480190 103480214 154943 DQ540839 103325247 103325275 2484

Cluster1 DQ566549 103322763 103322792 99 DQ699634 103322664 103322694 0 DQ544472 103322664 103322692 373 DQ709105 103322291 103322318 46 DQ705937 103322245 103322273 486

Singlets DQ699398 103321759 103321788 1211 DQ710151 103320548 103320577 692251 DQ548430 102628297 102628326 1 DQ558403 102628296 102628321 855795 DQ692476 101772501 101772529 481463 DQ544274 101291038 101291068 484047

Cluster2 DQ723982 100806991 100807020 1 DQ709023 100806990 100807020 3 DQ712307 100806987 100807014 0

Singlets DQ709654 100806987 100807012 571051 DQ707370 100235936 100235962 1481849

8
  • 3
    Could you add the expected output for the given input sample? The description of the logic is fuzzy. Commented Sep 11, 2015 at 9:13
  • 1
    Don't use bareword filehandles, don't use 2-arg open, don't ignore open errors. Commented Sep 11, 2015 at 9:22
  • 1
    Always use strict; use warnings;. I'm pretty sure that would have found my $i>=0;, for example. Commented Sep 11, 2015 at 9:22
  • @melpomene, yes I got a warning at $i>=0, but don't know why it is giving that. its numerical equating isn't it? Commented Sep 11, 2015 at 9:26
  • 1
    @Kanhu my $i >= 0 defines a new variable $i containing undef. Then it compares undef to 0 (numerically). Then it throws the result of the comparison away. It's otherwise equivalent to my $i (also, why would you want to compare a variable you only just created?). What you probably meant to write is my $i = 0;. Commented Sep 11, 2015 at 9:27

1 Answer 1

1

This seems to produce the expected output. I'm not sure I understood the specification correctly, so there might be errors and edge cases.

How it works: it remembers what kind of section it's currently outputting ($section, Singlet or Cluster). It accumulates lines in the @cluster array if they belong together, when an incompatible line arrives, the cluster is printed and a new one is started. If the cluster to print has only one member, it's treated as a singlet.

#!/usr/bin/perl
use warnings;
use strict;

my $section = q();
my @cluster;
my $cluster_count = 1;

sub output {
    if (@cluster > 1) {
        print "Cluster$cluster_count\n";
        $cluster_count++;

    } elsif (1 == @cluster) {
        print $section = 'Singlet', "s\n" unless 'Singlet' eq $section;
    }

    print for @cluster;
    @cluster = ();
}

my $last = 'INF';
while (<>) {
    my ($id, $from, $to, $value) = split;
    if ($value > 1000 || 1000 < abs($last - $from)) {
        output();

    } else {
        $section = 'Cluster';
    }

    push @cluster, $_;
    $last = $to;
}
output();
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.