My current script runs through data from a PDB and stores them to arrays then those arrays are used for the rest of the script. The script runs very well on a small PDB file but when I use a real PDB file I end up using all the computers memory on just one PDB file. I have 2000 PDB files I need calculations done on.
This is my full current script with a few notes.
Full script:
#!/usr/bin/perl
use warnings;
use strict;
#my $inputfile = $ARGV[0];
#my $inputfile = '8ns_emb_alt_test.pdb';
my $inputfile = '8ns_emb_alt_101.pdb';
open( INPUTFILE, "<", $inputfile ) or die $!;
my @array = <INPUTFILE>;
### Protein
my $protein = 'PROT';
my @protx;
my @proty;
my @protz;
for ( my $line = 0; $line <= $#array; ++$line ) {
if ( ( $array[$line] =~ m/\s+$protein\s+/ ) ) {
chomp $array[$line];
my @splitline = ( split /\s+/, $array[$line] );
push @protx, $splitline[5]; #This has 2083 x-coordinates
push @proty, $splitline[6]; #this has 2083 y-coordinates
push @protz, $splitline[7]; #this has 2083 z-coordinates
}
}
### Lipid
my $lipid1 = 'POPS';
my $lipid2 = 'POPC';
my @lipidx;
my @lipidy;
my @lipidz;
for ( my $line = 0; $line <= $#array; ++$line ) {
if ( ( $array[$line] =~ m/\s+$lipid1\s+/ ) || ( $array[$line] =~ m/\s+$lipid2\s+/ ) ) {
chomp $array[$line];
my @splitline = ( split /\s+/, $array[$line] );
push @lipidx, $splitline[5]; #this has approximately 35,000 x coordinates
push @lipidy, $splitline[6]; #same as above for y
push @lipidz, $splitline[7]; #same as above for z
}
}
### Calculation
my @deltaX = map {
my $diff = $_;
map { $diff - $_ } @lipidx
} @protx; #so this has 2083*35000 data x-coordinates
my @squared_deltaX = map { $_ * $_ } @deltaX; #this has all the x-coordinates squared from @deltaX
my @deltaY = map {
my $diff = $_;
map { $diff - $_ } @lipidy
} @proty;
my @squared_deltaY = map { $_ * $_ } @deltaY;
my @deltaZ = map {
my $diff = $_;
map { $diff - $_ } @lipidz
} @protz;
my @squared_deltaZ = map { $_ * $_ } @deltaZ;
my @distance;
for ( my $ticker = 0; $ticker <= $#array; ++$ticker ) {
my $distance_calc = sqrt( ( $squared_deltaX[$ticker] + $squared_deltaY[$ticker] + $squared_deltaZ[$ticker] ) );
push @distance, $distance_calc;
} #this runs the final calculation and computes all the distances between the atoms
### The Hunt
my $limit = 5;
my @DistU50;
my @resid_tagger;
for ( my $tracker = 0; $tracker <= $#array; ++$tracker ) {
my $dist = $distance[$tracker];
if ( ( $dist < $limit ) && ( $array[$tracker] =~ m/\s+$protein\s+/ ) ) {
my @splitline = ( split /\s+/, $array[$tracker] );
my $LT50 = $dist;
push @resid_tagger, $splitline[4]; #selects stores a selected index number
push @DistU50, $LT50; #stores the values within the $limit
}
} #this 'for' loop search for all the elements in the '@distance' and pushes them to the final arrays and also makes an array of certain index numbers in to another array.
### Le'Finali
print "@resid_tagger = resid \n";
print "5 > @DistU50 \n";
close INPUTFILE;
One of my lab friends said that I could store some of the data into files so that it takes up less memory. I think that is a fine idea but I am not sure where would be the most efficient place to do that and how many times would I have to do that. I did this with arrays because it is the best way I knew how to do this.
If anyone could show me an example of where I could take an array and turn it into a file and then use the data in that file again that would be really helpful. Otherwise, if any ones has an idea that I can look up, things to try or just suggestions that would be at least start me somewhere.