I'm writing a piece of code that extracts some numbers from an input file, which holds information for two conditions. The code therefore extracts two numbers for each line, and compares them against each other. The snippet below works fine, but I'm having trouble understanding which of the below approaches is 'correct', and why:
Input:
gi|63100484|gb|BC094950.1|_Xenopus_tropicalis_cDNA_clone_IMAGE:7022272 C1:XLOC_017431_0.110169:4.99086,_Change:5.5015,_p:0.00265,_q:0.847141 [95.08] C2:XLOC_020690_0.050681:9.12527,_Change:7.49228,_p:0.0196,_q:0.967194 [95.08]
gi|6572468|emb|AJ251750.1|_Xenopus_laevis_mRNA_for_frizzled_4_protein_(fz4_gene) C1:XLOC_027664_1.61212:4.37413,_Change:1.44003,_p:0.00515,_q:0.999592 [99.40] C2:XLOC_032999_2.94775:14.2322,_Change:2.27147,_p:5e-05,_q:0.0438548 [99.40]
gi|68533737|gb|BC098974.1|_Xenopus_laevis_RDC1_like_protein,_mRNA_(cDNA_clone_MGC:114801_IMAGE:4632706),_complete_cds C1:XLOC_036220_0.565861:6.52476,_Change:3.52741,_p:0.00015,_q:0.21728 [99.95] C2:XLOC_043165_0.157752:2.52129,_Change:3.99843,_p:0.02115,_q:0.99976 [99.95]
gi|70672087|gb|DQ096846.1|_Xenopus_laevis_degr03_mRNA,_complete_sequence C1:XLOC_031048_0.998437:4.20942,_Change:2.07588,_p:0.01365,_q:0.999592 [99.87] C2:XLOC_037051_1.1335:4.36819,_Change:1.94624,_p:0.01905,_q:0.9452 [99.87]
gi|70672102|gb|DQ096861.1|_Xenopus_laevis_rexp44_mRNA,_complete_sequence C1:XLOC_049520_12.3353:6.30193,_Change:-0.968926,_p:0.04935,_q:0.999592 [92.90] C2:XLOC_058958_13.0419:5.10275,_Change:-1.35381,_p:0.0373,_q:0.99976 [92.90]
gi|7110523|gb|AF231711.1|_Xenopus_laevis_7-transmembrane_receptor_frizzled-1_mRNA,_complete_cds C1:XLOC_038309_0.784476:2.37536,_Change:1.59835,_p:0.0079,_q:0.999592 [99.94] C2:XLOC_045678_0.692883:3.52599,_Change:2.34735,_p:0.00125,_q:0.341583 [99.94]
#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
my @intersect = read_file('text.txt');
my (@q1, @q2, @change_q, @q_values, @q_value1, @q_value2);
foreach (@intersect) {
chomp;
@q_value1 = ($_ =~ /C1:.*?q:(\d+\.\d+)/);
@q_value2 = ($_ =~ /C2:.*?q:(\d+\.\d+)/);
push @q_values, "C1:@q_value1\tC2:@q_value2";
if (abs $q_value1[@_] < abs $q_value2[@_]) {
push @change_q, $q_value1[@_];
}
elsif (abs $q_value2[@_] < abs $q_value1[@_]) {
push @change_q, $q_value2[@_];
}
}
print Dumper (\@q_values);
print Dumper (\@change_q);
Output:
$VAR1 = [
'C1:0.847141 C2:0.967194',
'C1:0.999592 C2:0.0438548',
'C1:0.21728 C2:0.99976',
'C1:0.999592 C2:0.9452',
'C1:0.999592 C2:0.99976',
'C1:0.999592 C2:0.341583'
];
$VAR1 = [
'0.847141',
'0.0438548',
'0.21728',
'0.9452',
'0.999592',
'0.341583'
];
This works perfectly, outputting the smaller 'q-value' for each condition. However replacing @_ with $#_ also works.
As does this approach:
foreach (@intersect) {
chomp;
@q_value1 = ($_ =~ /C1:.*?q:(\d+\.\d+)/);
@q_value2 = ($_ =~ /C2:.*?q:(\d+\.\d+)/);
push @q_values, "C1:@q_value1\tC2:@q_value2";
my $q_value1 = $q_value1[0] // $q_value1[1];
my $q_value2 = $q_value2[0] // $q_value2[1];
if (abs $q_value1 < abs $q_value2) {
push @change_q, $q_value1;
}
elsif (abs $q_value2 < abs $q_value1) {
push @change_q, $q_value2;
}
}
print Dumper (\@q_values);
print Dumper (\@change_q);
Output:
$VAR1 = [
'C1:0.847141 C2:0.967194',
'C1:0.999592 C2:0.0438548',
'C1:0.21728 C2:0.99976',
'C1:0.999592 C2:0.9452',
'C1:0.999592 C2:0.99976',
'C1:0.999592 C2:0.341583'
];
$VAR1 = [
'0.847141',
'0.0438548',
'0.21728',
'0.9452',
'0.999592',
'0.341583'