I have the data in this format
b1 1995 1
b1 2007 0.1
b2 1974 0.1
b2 1974 0.6
b2 1975 0.3
And, I want to sum values in column 3 based on similar values in both columns 1 and 2.
I have written a code where it sums up the value but I do not know how to print the groups values.
use strict;
use warnings;
use Data::Dumper;
my $file=shift;
open (DATA, $file);
my %score_by_year;
while ( my $line = <DATA> )
{
my ($protein, $year, $score) = split /\s+/, $line;
$score_by_year{$year} +=$score;
print "$protein\t$year\t$score_by_year{$year}\n";
}
close DATA;
so my code gives output as:
b1 1995 1
b1 2007 0.1
b2 1974 0.1
b2 1974 0.7
b2 1975 0.3
whereas, the expected output is this:
b1 1995 1
b1 2007 0.1
b2 1974 0.7
b2 1975 0.3
DATA(which already has meaning). Use lexical vars. /// Don't use 2-arg open. /// Check the result ofopencause it's a frequent source of failure.open(my $fh, '<', $qfn) or die("Can't open \"$qfn\": $!\n");split ' ', $linealmost always makes more sense thansplit /\s+/, $line. Though if your input is tab-separated like your outputsplit /\t/, $linewould be the appropriate solution here.datamash groupby 1,2 sum 3 < input.tsv. (If your real input isn't already sorted the way your sample is, add-s).