2

I'm writing a piece of code that extracts some numbers from an input file, which holds information for two conditions. The code therefore extracts two numbers for each line, and compares them against each other. The snippet below works fine, but I'm having trouble understanding which of the below approaches is 'correct', and why:

Input:

gi|63100484|gb|BC094950.1|_Xenopus_tropicalis_cDNA_clone_IMAGE:7022272  C1:XLOC_017431_0.110169:4.99086,_Change:5.5015,_p:0.00265,_q:0.847141 [95.08]   C2:XLOC_020690_0.050681:9.12527,_Change:7.49228,_p:0.0196,_q:0.967194 [95.08]
gi|6572468|emb|AJ251750.1|_Xenopus_laevis_mRNA_for_frizzled_4_protein_(fz4_gene)        C1:XLOC_027664_1.61212:4.37413,_Change:1.44003,_p:0.00515,_q:0.999592 [99.40]   C2:XLOC_032999_2.94775:14.2322,_Change:2.27147,_p:5e-05,_q:0.0438548 [99.40]
gi|68533737|gb|BC098974.1|_Xenopus_laevis_RDC1_like_protein,_mRNA_(cDNA_clone_MGC:114801_IMAGE:4632706),_complete_cds   C1:XLOC_036220_0.565861:6.52476,_Change:3.52741,_p:0.00015,_q:0.21728 [99.95]   C2:XLOC_043165_0.157752:2.52129,_Change:3.99843,_p:0.02115,_q:0.99976 [99.95]
gi|70672087|gb|DQ096846.1|_Xenopus_laevis_degr03_mRNA,_complete_sequence        C1:XLOC_031048_0.998437:4.20942,_Change:2.07588,_p:0.01365,_q:0.999592 [99.87]  C2:XLOC_037051_1.1335:4.36819,_Change:1.94624,_p:0.01905,_q:0.9452 [99.87]
gi|70672102|gb|DQ096861.1|_Xenopus_laevis_rexp44_mRNA,_complete_sequence        C1:XLOC_049520_12.3353:6.30193,_Change:-0.968926,_p:0.04935,_q:0.999592 [92.90] C2:XLOC_058958_13.0419:5.10275,_Change:-1.35381,_p:0.0373,_q:0.99976 [92.90]
gi|7110523|gb|AF231711.1|_Xenopus_laevis_7-transmembrane_receptor_frizzled-1_mRNA,_complete_cds C1:XLOC_038309_0.784476:2.37536,_Change:1.59835,_p:0.0079,_q:0.999592 [99.94]   C2:XLOC_045678_0.692883:3.52599,_Change:2.34735,_p:0.00125,_q:0.341583 [99.94]


#!/usr/bin/perl 
use strict;
use warnings;
use File::Slurp;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;

my @intersect = read_file('text.txt');

my (@q1, @q2, @change_q, @q_values, @q_value1, @q_value2);
foreach (@intersect) {
    chomp;
    @q_value1 = ($_ =~ /C1:.*?q:(\d+\.\d+)/);
    @q_value2 = ($_ =~ /C2:.*?q:(\d+\.\d+)/);
    push @q_values, "C1:@q_value1\tC2:@q_value2";
        if (abs $q_value1[@_] < abs $q_value2[@_]) {
            push @change_q, $q_value1[@_];
        }
        elsif (abs $q_value2[@_] < abs $q_value1[@_]) {
            push @change_q, $q_value2[@_];
        }
}

print Dumper (\@q_values);
print Dumper (\@change_q);

Output:

$VAR1 = [
          'C1:0.847141  C2:0.967194',
          'C1:0.999592  C2:0.0438548',
          'C1:0.21728   C2:0.99976',
          'C1:0.999592  C2:0.9452',
          'C1:0.999592  C2:0.99976',
          'C1:0.999592  C2:0.341583'
        ];
$VAR1 = [
          '0.847141',
          '0.0438548',
          '0.21728',
          '0.9452',
          '0.999592',
          '0.341583'
        ];

This works perfectly, outputting the smaller 'q-value' for each condition. However replacing @_ with $#_ also works.

As does this approach:

foreach (@intersect) {
    chomp;
    @q_value1 = ($_ =~ /C1:.*?q:(\d+\.\d+)/);
    @q_value2 = ($_ =~ /C2:.*?q:(\d+\.\d+)/);
    push @q_values, "C1:@q_value1\tC2:@q_value2";
        my $q_value1 = $q_value1[0] // $q_value1[1];
        my $q_value2 = $q_value2[0] // $q_value2[1];
        if (abs $q_value1 < abs $q_value2) {
            push @change_q, $q_value1;
        } 
        elsif (abs $q_value2 < abs $q_value1) {
            push @change_q, $q_value2;
        }
}
print Dumper (\@q_values);
print Dumper (\@change_q);

Output:

$VAR1 = [
          'C1:0.847141  C2:0.967194',
          'C1:0.999592  C2:0.0438548',
          'C1:0.21728   C2:0.99976',
          'C1:0.999592  C2:0.9452',
          'C1:0.999592  C2:0.99976',
          'C1:0.999592  C2:0.341583'
        ];
$VAR1 = [
          '0.847141',
          '0.0438548',
          '0.21728',
          '0.9452',
          '0.999592',
          '0.341583'

1 Answer 1

5

"This works perfectly" is putting it a bit strong. It works coincidentally would be a better description. You are using the @_ array, its highest index $#_ and the number zero, getting the same result every time. What you are not realizing is that @_ is actually empty, because it is only used when passing arguments to subroutines. So when you say

$foo[@_]

You are really saying

$foo[0]

And when you are saying

$foo[$#_]

You are really saying

$foo[-1]

For extra fun, -1 is also a valid array element, meaning the last element in the array, so for an array of size 1 or 2, it probably seems to work fine.

Because in scalar context, an array @_ will return its size, which in this case is 0. $#_ will return -1 when @_ is empty, because there is no highest index.

So, to answer your question: Because using @_ is wrong and only works on accident, using fixed numbers 0 and 1 is the better solution.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.