1

I am reading an ordered file for which I must count by-hour, by-minute or by-second occurrences. If requested, I must print times with 0 occurrences (normalized output) or skip them (non-normalized output). The output must obviously be ordered.

I first thought using an array. When the output is non normalized, I am doing roughly the equivalent of:

@array[10] = 100;
@array[10000] = 10000;

And to print the result:

foreach (@array) {
  print if defined;
}

Is there a way to reduce iterations to only elements defined in the array? In the previous example, that would mean doing only two iterations, instead of 10000 as using $#array implies. Then I would also need a way to know the current array index in a loop. Does such a thing exist?

I am thinking more and more to use a hash instead. Using a hash solves my problem and also eliminates the need to convert hh:mm:ss times to index and vice-versa.

Or do you have a better solution to suggest for this simple problem?

6
  • 3
    hash is indeed what you need in this case. Commented Oct 30, 2012 at 13:55
  • 1
    When the "key" or "index" range is relatively large compared to the number of meaningful elements (ie, a sparse structure), a hash is better suited. If the number of meaningful elements is high relative to the range of indices (a dense structure), and the cost of computing indices is low, an array can be more time efficient since it avoids the overhead of the hashing algorithm. Commented Oct 30, 2012 at 16:34
  • Question is, I think, why you use an array in the first place? Are the indexes part of your data? If not, why bother with them? Commented Oct 30, 2012 at 17:27
  • @TLP: index is time. It is part of the data. Commented Oct 30, 2012 at 18:03
  • Use a hash, or a two-dimensional array, is my advice. E.g. push @array [ 10, 100 ]. No sense keeping empty array elements. Commented Oct 30, 2012 at 18:34

2 Answers 2

6

Yes, use a hash. You can iterate over the ordered array of the keys of the hash if your keys sort correctly.

Sign up to request clarification or add additional context in comments.

2 Comments

I settled for a solution using an array. The advantage of the hash is not obvious because my data is generally well distributed over the day. Especially when producing stats per-hour. With a hash I wouldn't need to sort keys because they represent time values. I simply have to iterate over time and print corresponding hash values. A hash would save some memory depending on situations. Maybe it would also be faster (for this problem) but I'd have to test.
Is your data regularly spaced or irregular? If it is irregular, you will end up with gaps which waste space; the more granularity you need in ability to say '0 events in this period', the more spaces you'll have. Perhaps if you give a couple of real examples it will be clearer.
2

You can also remember just the pairs of numbers in an array:

#!/usr/bin/perl
use warnings;
use strict;

my @ar = ( [  10, 100 ],
           [ 100,  99 ],
           [  12,   1 ],
           [  13,   2 ],
           [  15,   1 ],
         );

sub normalized {
    my @ar = sort { $a->[0] <=> $b->[0] } @_;
    map "@$_", @ar;
}

sub non_normalized {
    my @ar = sort { $a->[0] <=> $b->[0] } @_;
    unshift @ar, [0, 0] unless $ar[0][0] == 0;
    my @return;
    for my $i (0 .. $#ar) {
        push @return, "@{ $ar[$i] }";
        push @return, $_ . $" . 0 for 1 + $ar[$i][0] .. $ar[$i + 1][0] - 1;
    }
    return @return;
}


print join "\n", normalized(@ar), q();
print "\n";
print join "\n", non_normalized(@ar), q();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.