how to process a large array in perl

Question

I'm beginner in perl, and I'm trying to process a large array. The contents in the array are the rows returned after excuting an SQL query exceute query : select col1, col2, col3, col4 from my_table where rownum < 50000

while (my $h = $sth->fetchrow_hashref) {
 push (@data, $h);
}

for my $row (@data) {
 #process $row
}

Processing 1 element requiress ~1 second, and processing 50k approx 15 hours are required. So, I thought it's a better idea if I store it first.

Is fetchrow_hashref good? Someone told me to use fetchrow_arrayref(faster) or fetchrow_array(fatstest)
I've to process a $row only once. Shall I process the items as
```
for my $row (@data) {
 #process $row
 shift @data;
}
```
Since the array is quite large, I don't think accessing it in list context is a good idea. Is the following better:
```
while (@data) {
 #process $row
 shift @data;
}
```

Don't build @data array, but use first while loop to do wanted processing on $h. 2. and 3. are not correct or not making sense. — mpapec
– mpapec, Commented Dec 17, 2013 at 7:31
Processing 1 element requiress ~1 second, and processing 50k approx 15 hours are required. So, I thought it's a better idea if I store it first. — GrSrv
– GrSrv, Commented Dec 18, 2013 at 10:02

Jim Garrison · Accepted Answer · 2013-12-17 07:59:38Z

4

If you have to process each row only once, and do not need to refer to previous rows while processing a given row, then do not load the entire array into memory. Process the row in the same loop that reads the rows. There's no need to take up the memory to store data you no longer need.

answered Dec 17, 2013 at 7:59

Jim Garrison

87k20 gold badges162 silver badges197 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

GrSrv Over a year ago

Processing 1 element requiress ~1 second, and processing 50k approx 15 hours are required. So, I thought it's a better idea if I store it first.

Jim Garrison Over a year ago

What are you doing that takes 1 second per row? Is that processing CPU-bound or does it do other I/O or database access?

GrSrv Over a year ago

few db access + occasional cURl/LWP

simbabque · Accepted Answer · 2013-12-17 14:08:39Z

0

If you are using this code:

while (my $h = $sth->fetchrow_hashref) {
 push (@data, $h);
}

for my $row (@data) {
 #process $row
}

Just have DBI pull all the data for you, using selectall_arrayref, this will do for you the code you did on your own.

This code:

my %attrib = ('Slice' => {});
my $refArray = $refDB->selectall_arrayref($SQL, \%attrib, @binders);

Will return $refArray with all your data.

So a complete code will be:

my %attrib = ('Slice' => {});
my $refArray = $refDB->selectall_arrayref($SQL, \%attrib, @binders);
my @Array;
if (defined $refArray) {
 @Array = @{$refArray};
}

foreach my $ptrItem (@Array) {
 my %Item = %{$ptrItem};
 ...
}

I don't think this is heavier than doing your fetchrow_hashref into an array. It will be more memory consuming than doing fetchrow_hashref and handling each row on its own. As the fetchrow uses a MySQL CURSOR to keep track where it is rather than pull all the data to the perl code.

edited Dec 17, 2013 at 14:08

simbabque

54.4k8 gold badges77 silver badges141 bronze badges

answered Dec 17, 2013 at 7:27

Noam Rathaus

5,6082 gold badges31 silver badges37 bronze badges

2 Comments

simbabque Over a year ago

Why make a copy of the 50k rows and put it into @Array? You can save the if (defined... completely by just going foreach my $ptrItem (@{ $refArray }) and my $refArray = $refDB->foo // [].

Noam Rathaus Over a year ago

I like code to be clear, just because perl has shortcuts it doesn't mean they are clear to everyone

Community · Accepted Answer · 2017-05-23 12:28:40Z

Let's see.

Is fetchrow_hashref good? Someone told me to use fetchrow_arrayref(faster) or fetchrow_array(fatstest)

Yes, because you can see the column names in your code. It will save you time later doing maintenance. We are only talking about 50k rows. That is not so many, assuming this is not running twice every minute. Pleae take a look at this question: When to use $sth->fetchrow_hashref, $sth->fetchrow_arrayref and $sth->fetchrow_array?

I've to process a $row only once. Shall I process the items as
for my $row (@data) {
  #process $row
  shift @data;
}

As the others have already said, it never makes sense to save the data first and work with it later. I would do it as follows. That saves you the overhead of creating a copy of the data.

while (my $res = $sth->fetchrow_hashref) { # process $res process($res); }

Since the array is quite large, I don't think accessing it in list context is a good idea. Is the following better:
while (@data) {
  #process $row
  shift @data;
}

I'm not sure what you are talking about here. Perl is meant to deal with large arrays. In fact, 50k rows is not a lot (unless you keep BLOB fields with serialized 5MB jpegs in your DB). There is always more than one way to do it in Perl, and you need to pick the one that works best for you. That will save you time later.

Unless you are feeling it is slow. In that case, you should benchmark. There are multiple ways to do that. The easiest is to use the Benchmark module. It comes with good documentation. If that is not enough, take a look at Devel::NYTProf. You might also come across DBI profiling, but right now that is not what you want.

Also take a look at these really old slides by Tim Bunce, who built the DBI: http://cpansearch.perl.org/src/TIMB/DBI_AdvancedTalk_2004/sld017.htm

Processing 1 element requiress ~1 second, and processing 50k approx 15 hours are required. So, I thought it's a better idea if I store it first.

Collectives™ on Stack Overflow

how to process a large array in perl

3 Answers 3

3 Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related