I have a few-hundreds of lines file of the form
1st 2n 2p 3n 3p 4n 4p
1ABJa 2 20 8 40 3 45
1ABJb 2 40 8 80 3 45
2C3Da 4 50 5 39 2 90
2D4Da 1 10 8 90 8 65
(tab separated file)
From this file, I want to manipulate all lines that have a similar 4 beginning characters in the 1st column (i.e. 1ABJa and 1ABJb) and do:
- for column 1 merge both names maintaining the common characters;
- for columns
2n, 3n, 4n...the numbers would be summed; - for columns
2p, 3p, 4p, ...the numbers would be averaged.
(note that this can be specified by column position and not name). This would then yield:
1st 2n 2p 3n 3p 4n 4p
1ABJab 4 30 16 60 6 45
2C3Da 4 50 5 39 2 90
2D4Da 1 10 8 90 8 65
How would you solve this?
This is probably the most complicated way to do this, but here it goes: I am thinking about creating an array of all 4-character unique elements of the 1st column. Then, for that array, running a loop that finds all instances matching those 4 characters. If there are more than 1 instance, identify them, push the columns, and manipulate them. Here's the point that I got until now:
#!/usr/local/bin/perl
use strict;
use warnings;
use feature 'say';
use List::MoreUtils qw(uniq);
my $dir='My\\Path\\To\\Directory';
open my $in,"<", "$dir\\my file.txt" or die;
my @uniqarray; my @lines;
#collects unique elements in 1st column and changes them to 4-character words
while (my $line = <$in>) {
chomp $line;
@lines= split '\t', $line;
if (!grep /$lines[0]/, @uniqarray ){
$lines[0] =~ s/^(.{4}).*/$1/;
push @uniqarray,$lines[0];
}
}
my @l;
#for @uniqarray, find all rows in the input that match them. if more than 1 row is found, manipulate the columns
while (my $something=<$in>) {
chomp $something;
@l= split '\t', $something;
if ( map $something =~ m/$_/,@uniqarray){
**[DO STUFF]**
}
}
print join "\n", uniq(@uniqarray);
close $in;
1ABJab? You haven't specified a rule, so it seems like it could just as easily be1ABJa.1ABJabbecause it contains data from both1ABJaand1ABJb, and I want to distinguish it from the other rows. I will add the rule for this. Thanks!yeild'slooks like after the fact, the results are merged with the lines that aren't analyzed.1ABJb), not a combination.'D:\'is incorrect code, the backslash will escape your closing quote. Which is quite visible in the Markdown formatting above.