5

In terms of memory consumption/ execution time, what's a more expensive way of adding items to a group

Redim Preserve myArray(1 To Ubound(myArray) + 1)
myArray(Ubound(myArray)) = myVal

Or

myCollection.Add myVal

Does the fastest one depend on myVal, or vary with the size of the groups? Is there a faster method still?

I have the array/collection declared Privately in the declarations portion of a class if that makes a difference, but I'd like to know what is happening behind the scenes and which approach is generally faster (not in terms of readability or maintainability, just execution time)

Tests

OK, having run some tests adding many instances of 1 to groups and collections, my results are:

  • collection method 10 x faster than a variant array
  • collection method 5 x faster than a long array
  • collection method 1.5 x faster than a byte array

Results were all for approx 5 seconds of looping with this code:

Sub testtime()
Dim sttime As Double
Dim endtime As Double
Dim i As Long
Dim u As Long
i = 0
ReDim a(1 To 1) 'or Set c = New Collection
sttime = Timer
endtime = Timer + 5
Do Until Timer > endtime
    u = UBound(a) + 1
    ReDim Preserve a(1 To u)
    a(u) = 1
    'or c.Add 1
    i = i + 1
Loop
endtime = Timer
Debug.Print (endtime - sttime) / i; "s", i; "iterations", Round(endtime - sttime, 3); "(ish) s"
End Sub

So it looks like for adding that item, with relatively large groups; adding to a collection is faster, but I'd like to know why?

4
  • What do your tests/benchmarks show? Did you test/benchmark it? Commented Aug 17, 2017 at 16:55
  • @Mat'sMug I appreciate what you're getting at, but I'm more interested in theory than a single test. I could do some benchmarks for a few random values, but even then I can't really be certain what effect different datatypes might have, whether there is some reason an array may be slower/faster than collection depending on its size, etc. I mean what goes on behind the scenes when I ask for the ubound of an array, or the final item of a collection - and is there a fundamental reason why one may be faster than the other under certain conditions (if say, a different sorting algorithm is used)? Commented Aug 17, 2017 at 17:04
  • @Mat'sMug I've added some tests, but as I say, I'd like to know what the reasoning might be, so I can extrapolate to other test cases Commented Aug 17, 2017 at 17:35
  • Kudos for adding the tests - are you running ReDim Preserve a(1 To u) regardless of whether you're working with a collection or an array? That would wreck your results... Commented Aug 17, 2017 at 17:49

1 Answer 1

15

ReDim Preserve is skewing it all.

ReDim Preserve myArray(1 To UBound(myArray) + 1)

That's inherently inefficient, and an unfair comparison. You're copying the entire array internally, every time you add an item. I would hope a Collection is much more efficient than that. If not, use .NET's ArrayList, which is deprecated in .NET since v2.0 introduced generics and List<T>, but usable - and useful - in VBA (.NET generics can't be used in VBA).

An ArrayList doesn't resize its internal _items array every single time an item is added! Notice the comments:

// Adds the given object to the end of this list. The size of the list is
// increased by one. If required, the capacity of the list is doubled
// before adding the new element.
//
public virtual int Add(Object value) {
    Contract.Ensures(Contract.Result<int>() >= 0);
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    _items[_size] = value;
    _version++;
    return _size++;
}

...

// Ensures that the capacity of this list is at least the given minimum
// value. If the currect capacity of the list is less than min, the
// capacity is increased to twice the current capacity or to min,
// whichever is larger.
private void EnsureCapacity(int min) {
    if (_items.Length < min) {
        int newCapacity = _items.Length == 0? _defaultCapacity: _items.Length * 2;
        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
        if (newCapacity < min) newCapacity = min;
        Capacity = newCapacity;
    }
}

https://referencesource.microsoft.com/#mscorlib/system/collections/arraylist.cs

I don't know about the internals of a VBA.Collection, but if I had to guess, I would say it likely has a similar mechanism that avoids re-dimensioning the internal array every single time an item is added. But that's all moot, since nobody except Microsoft knows how VBA.Collection is implemented.

What we can do though, is run benchmarks and compare - let's add one million values to an array, a collection, and heck, an ArrayList:

Public Sub TestReDimArray()
    Dim sut() As Variant
    ReDim sut(0 To 0)
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While UBound(sut) < 1000000
        ReDim Preserve sut(0 To i)
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "ReDimArray added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestCollection()
    Dim sut As VBA.Collection
    Set sut = New VBA.Collection
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "Collection added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestArrayList()
    Dim sut As Object
    Set sut = CreateObject("System.Collections.ArrayList")
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "ArrayList added 1M items in " & Timer - t & " seconds."
End Sub

Here's the output:

ReDimArray added 1M items in 14.90234 seconds.
Collection added 1M items in 0.1875 seconds.
ArrayList added 1M items in 15.64453 seconds.

Note that referencing the 32-bit mscorlib.tlb and early-binding the ArrayList doesn't make much of a difference. Plus there's managed/COM interop overhead, and VBA doesn't support constructors, so the ArrayList initializes with a capacity of 4 that doubles every time capacity is reached, i.e. when we insert the millionth item we've resized the internal array 19 times and end up with an internal capacity of 1,048,576 items.

So how is Collection winning by that much anyway?

Because the array is being abused: resizing isn't what arrays do best, and resizing before every insert can't possibly go well.

When to use arrays?

Use arrays when you know the number of elements in advance:

Public Sub TestPopulateArray()
    Dim sut(0 To 999999) As Variant
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While i < 1000000
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "PopulateArray added 1M items in " & Timer - t & " seconds."
End Sub

Output:

PopulateArray added 1M items in 0.0234375 seconds.

That's roughly 10 times faster than adding the same number of items to a VBA.Collection - well-used arrays are blazingly fast.


TL;DR

Keep array resizing to a minimum - avoid it as much as possible. If you don't know the number of items you're going to end up with, use a Collection. If you do, use an explicitly sized Array.

Sign up to request clarification or add additional context in comments.

3 Comments

Or, if you must use ReDim Preserve don't increment the size by 1. Add 10 or 50 or 100 or just double it each time you've reached max capacity.
Why bother with ArrayList at all when Collection is 15 times faster... I don't get it...
@71GA I mentioned ArrayList because contrary to VBA.Collection, its source code is readily available so we can know exactly how it works, showing ArrayList builds on top of arrays and making a reasonable extrapolation that VBA.Collection works similarly. The question was specifically about arrays vs collections, with arrays being severely misused and asking why collections were faster; this answer shows that well-used arrays are roughly 10x faster than collections. What's not covered is iterating the items: for vs foreach will make a huge difference here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.