0

I'm beginner in perl, and I'm trying to process a large array. The contents in the array are the rows returned after excuting an SQL query exceute query : select col1, col2, col3, col4 from my_table where rownum < 50000

while (my $h = $sth->fetchrow_hashref) {
 push (@data, $h);
}

for my $row (@data) {
 #process $row
}

Processing 1 element requiress ~1 second, and processing 50k approx 15 hours are required. So, I thought it's a better idea if I store it first.

  1. Is fetchrow_hashref good? Someone told me to use fetchrow_arrayref(faster) or fetchrow_array(fatstest)

  2. I've to process a $row only once. Shall I process the items as

    for my $row (@data) {
     #process $row
     shift @data;
    }
    
  3. Since the array is quite large, I don't think accessing it in list context is a good idea. Is the following better:

    while (@data) {
     #process $row
     shift @data;
    }
    
2
  • 1
    Don't build @data array, but use first while loop to do wanted processing on $h. 2. and 3. are not correct or not making sense. Commented Dec 17, 2013 at 7:31
  • Processing 1 element requiress ~1 second, and processing 50k approx 15 hours are required. So, I thought it's a better idea if I store it first. Commented Dec 18, 2013 at 10:02

3 Answers 3

4

If you have to process each row only once, and do not need to refer to previous rows while processing a given row, then do not load the entire array into memory. Process the row in the same loop that reads the rows. There's no need to take up the memory to store data you no longer need.

Sign up to request clarification or add additional context in comments.

3 Comments

Processing 1 element requiress ~1 second, and processing 50k approx 15 hours are required. So, I thought it's a better idea if I store it first.
What are you doing that takes 1 second per row? Is that processing CPU-bound or does it do other I/O or database access?
few db access + occasional cURl/LWP
0

If you are using this code:

while (my $h = $sth->fetchrow_hashref) {
 push (@data, $h);
}

for my $row (@data) {
 #process $row
}

Just have DBI pull all the data for you, using selectall_arrayref, this will do for you the code you did on your own.

This code:

my %attrib = ('Slice' => {});
my $refArray = $refDB->selectall_arrayref($SQL, \%attrib, @binders);

Will return $refArray with all your data.

So a complete code will be:

my %attrib = ('Slice' => {});
my $refArray = $refDB->selectall_arrayref($SQL, \%attrib, @binders);
my @Array;
if (defined $refArray) {
 @Array = @{$refArray};
}

foreach my $ptrItem (@Array) {
 my %Item = %{$ptrItem};
 ...
}

I don't think this is heavier than doing your fetchrow_hashref into an array. It will be more memory consuming than doing fetchrow_hashref and handling each row on its own. As the fetchrow uses a MySQL CURSOR to keep track where it is rather than pull all the data to the perl code.

2 Comments

Why make a copy of the 50k rows and put it into @Array? You can save the if (defined... completely by just going foreach my $ptrItem (@{ $refArray }) and my $refArray = $refDB->foo // [].
I like code to be clear, just because perl has shortcuts it doesn't mean they are clear to everyone
0

Let's see.

Is fetchrow_hashref good? Someone told me to use fetchrow_arrayref(faster) or fetchrow_array(fatstest)

Yes, because you can see the column names in your code. It will save you time later doing maintenance. We are only talking about 50k rows. That is not so many, assuming this is not running twice every minute. Pleae take a look at this question: When to use $sth->fetchrow_hashref, $sth->fetchrow_arrayref and $sth->fetchrow_array?

I've to process a $row only once. Shall I process the items as

for my $row (@data) {
  #process $row
  shift @data;
}

As the others have already said, it never makes sense to save the data first and work with it later. I would do it as follows. That saves you the overhead of creating a copy of the data.

while (my $res = $sth->fetchrow_hashref) { # process $res process($res); }

Since the array is quite large, I don't think accessing it in list context is a good idea. Is the following better:

while (@data) {
  #process $row
  shift @data;
}

I'm not sure what you are talking about here. Perl is meant to deal with large arrays. In fact, 50k rows is not a lot (unless you keep BLOB fields with serialized 5MB jpegs in your DB). There is always more than one way to do it in Perl, and you need to pick the one that works best for you. That will save you time later.

Unless you are feeling it is slow. In that case, you should benchmark. There are multiple ways to do that. The easiest is to use the Benchmark module. It comes with good documentation. If that is not enough, take a look at Devel::NYTProf. You might also come across DBI profiling, but right now that is not what you want.

Also take a look at these really old slides by Tim Bunce, who built the DBI: http://cpansearch.perl.org/src/TIMB/DBI_AdvancedTalk_2004/sld017.htm

1 Comment

Processing 1 element requiress ~1 second, and processing 50k approx 15 hours are required. So, I thought it's a better idea if I store it first.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.