I would like to replace a grep | awk | perl command with a pure perl solution to make it quicker and simpler to run.
I want to match each line in input.txt with a data.txt file and calculate the average of the values with the matched ID names and numbers.
The input.txt contains 1 column of ID numbers:
FBgn0260798
FBgn0040007
FBgn0046692
I would like to match each ID number with it's corresponding ID names and associated value. Here's an example of data.txt where column 1 is the ID number, columns 2 and 3 are ID name1 and ID name2 and column 3 contains the values I want to calculate the average.
FBgn0260798 CG17665 CG17665 21.4497
FBgn0040007 Gprk1 CG40129 22.4236
FBgn0046692 RpL38 CG18001 1182.88
So far I used grep and awk to produce an output file containing the corresponding values for matched ID numbers and values and then used that output file to calculate the counts and averages using the following commands:
# First part using grep | awk
exec < input.txt
while read line
do
grep -w $line data.txt | cut -f1,2,3,4 | awk '{print $1,$2,$3,$4} ' >> output.txt
done
# Second part with perl
open my $input, '<', "output_1.txt" or die; ## the output file is from the first part and has the same layout as the data.txt file
my $total = 0;
my $count = 0;
while (<$input>) {
my ($name, $id1, $id2, $value) = split;
$total += $value;
$count += 1;
}
print "The total is $total\n";
print "The count is $count\n";
print "The average is ", $total / $count, "\n";
Both parts work OK but I would like to make it simplify it by running just one script. I've been trying to find a quicker way of running the whole lot together in perl but after several hours of reading, I am totally stuck on how to do it. I've been playing around with hashes, arrays, if and elsif statements without zero success. If anyone has suggestions etc, that would be great.
Thanks, Harriet
output_1.txt. Do you need any more? You seem to be asking for a pure-Perl solution to replace agrep|awk|perlcommand. Please explain more thoroughly, and show your complete command line.