3

I have a big data file dump.all.lammpstrj which I need to split/categorize into a series of files, such as Z_1_filename, Z_2_filename, Z_3_filename etc. based on the coordinates in each record.

The coordinates are saved in a disordered way, so my program reads each line and determines which file this record should be sent to.

I use a variable, $filehandle = "Z_$i_DUMP"

and I hope to open all of the possible files like this

for ( my $i = 1; $i <= 100; $i++ ) {
  $filehandle = "Z_$i_DUMP";
  open $filehandle,'>', "Z_$i_dump.all.lammpstrj.dat";
  ...
}

But when running my program, I get a message

Can't use string ("Z_90_DUMP") as a symbol ref while "strict refs" in use at ...

I don't want to scan all the data for each output file, because dump.all.lammpstrj is so big that a scan would take a long time.

Is there any way to use a defined variable as a file handle?

3
  • How big is this enormous file? Commented Jun 2, 2018 at 21:30
  • Will there always be exactly 100 output files? Can you give an example of the input file data, showing the coordinates and how they are converted into an output file number? Commented Jun 2, 2018 at 21:39
  • Corrected the tag "filehandler" (what is that?) to "filehandle" Commented Jun 3, 2018 at 1:52

3 Answers 3

5

To give you an idea on how this might be done. Put file handles in a hash (or perhaps array if indexed by numbers).

use strict;
use warnings;
my %fh;                                                        #file handles
open $fh{$_}, '>', "Z_${_}_dump.all.lammpstrj.dat" for 1..100; #open 100 files
for(1..10000){                                    #write 10000 lines in 100 files
    my $random=int(1+rand(100));                  #pick random file handle
    print {$fh{$random}} "something $_\n";
}
close $fh{$_} for 1..100;
Sign up to request clarification or add additional context in comments.

Comments

2

Don't assign anything to $filehandle or set it to undef before you call open(). You get this error because you have assigned a string to $filehandle (which is of no use anyway).

Also see "open" in perldoc:

If FILEHANDLE is an undefined scalar variable (or array or hash element), a new filehandle is autovivified, meaning that the variable is assigned a reference to a newly allocated anonymous filehandle. Otherwise if FILEHANDLE is an expression, its value is the real filehandle. (This is considered a symbolic reference, so use strict "refs" should not be in effect.)

To have more file handles at a time and to conveniently map them to the file names consider using a hash with the file name (or whatever identifier suits you) as key to store them in. You can check if the key exists (see "exists") and the value is defined (see "defined") to avoid reopening the file unnecessarily.

Comments

0

I sincerely appreciate Kjetil S. and sticky bit. I tested it and their suggestion work well. And I noticed that there is another way to write data to different files WITHOUT CHANGING filehandler. Actually I changed file names using same file handler.

....
for my $i=0;$i<=$max_number;$i++;
{
   $file="$i\_foo.dat";
   open DAT,'>>',"$file";
   ......
}

3 Comments

open is slow so if you have to open the same file more than one time, having many file handles are faster. But of course, in many situations it doesn't matter if a program uses 1.2 seconds or 1.3.
Btw, your code "$i_foo.dat" will look for a variable named $i_foo. You probably meant "${i}_foo.dat".
Thanks for your pointing out the error. It should be "$i_foo.dat". The file name includes an index. Your correction is a reference, correct?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.