1
\$\begingroup\$

I made a user-defined function union in VBA, such that:

  1. it could take variable parameters
  2. each parameter is a one-column range like A1, A2:A10; we don't need to consider passing constant values to parameters
  3. we could consider, within one input range, there are no duplicates; but it is very possible to have duplicates among input ranges.
  4. union combines the input ranges, and keeps the order of the elements. For instance, =union(A1:A5, C1:C2, E1:E3) has the following expected output in Column I:

enter image description here

I wrote the following code which works. However, it is slow. A union over a list of 4000 rows and a list of 20 rows takes several seconds.

First, I don't know whether the way I coded arrays could be improved. Second, the algorithm just consists in comparing each new element against the accumulating result list; there is no sort, no other techniques. Third, I don't know if there are any existing functions we could use in other objects of VBA (eg, VBA FILTER function, Collection, ArrayLists, Scripting.Dictionary).

Could anyone propose a more efficient code?

Function getDimension(var As Variant) As Long
    On Error GoTo Err
    Dim i As Long
    Dim tmp As Long
    i = 0
    Do While True
        i = i + 1
        tmp = UBound(var, i)
    Loop
Err:
    getDimension = i - 1
End Function

Function exists(v As Variant, arr As Variant, resCount As Long) As Boolean
    If resCount = 0 Then
        exists = False
    Else
        exists = False
        i = LBound(arr, 1)
        Do While (i <= resCount) And (Not exists)
            If arr(i) = v Then
                exists = True
            End If
            i = i + 1
        Loop
    End If
End Function

' assumption: every input is a range (eg, A1, A1:A2)
' assumption: each input range has only one column
Function union(ParamArray arr() As Variant) As Variant
    Dim res As Variant
    ReDim res(1 To 100000)
    Dim resCount As Long
    resCount = 0
    
    For k = LBound(arr) To UBound(arr)
        Dim arrk As Variant
        Dim v
        arrk = arr(k).Value2
        If getDimension(arrk) = 0 Then 'case of A1, B1
            v = arrk
            If Not exists(v, res, resCount) Then
                resCount = resCount + 1
                res(resCount) = v
            End If
        ElseIf getDimension(arrk) = 2 Then 'case of A1:A10, B1:B10
            For i = LBound(arrk, 1) To UBound(arrk, 1)
                v = arrk(i, 1)
                If Not exists(v, res, resCount) Then
                    resCount = resCount + 1
                    res(resCount) = v
                End If
            Next i
        End If
    Next k
    
    ReDim Preserve res(1 To resCount)
    union = Application.WorksheetFunction.Transpose(res)
End Function
\$\endgroup\$
11
  • \$\begingroup\$ Can you post your test workbook on GitHub or something? \$\endgroup\$ Commented Dec 7, 2021 at 4:04
  • 1
    \$\begingroup\$ A1:B2 is not a one-column range. If you are worried about performance, you probably also want to be more specific about your values in actual use cases. Are they numbers, arbitrary-length strings, fixed-length strings, strings of a maximum given length, etc.? \$\endgroup\$ Commented Dec 7, 2021 at 4:52
  • 2
    \$\begingroup\$ For an idea how to handle many ranges containing a huge number of values see k-way merge. \$\endgroup\$ Commented Dec 7, 2021 at 5:38
  • 1
    \$\begingroup\$ You can codereview.stackexchange.com/q/268446/146810 but it's a bit messy (although only has to be done once). If you can enforce some structure like consecutive columns and pass a single 2D array then you could use the new BYCOL function along with FILTER. Maybe I'll have a look... \$\endgroup\$ Commented Dec 7, 2021 at 8:41
  • 1
    \$\begingroup\$ This is getting a bit off topic - but limited to 253 in theory (limitation of LAMBDAS), and the Excel builtin TEXTJOIN is 252 args max. This could be done by giving all the arguments single character names where possible to not hit the max character limit for a formula. However the number you actually need is based on context, I'd say for PRINTF, a function call with more than 10 arguments is already kinda crazy, so the limit of ~100 args I don't consider constraint. More clever options here. Maybe add to your question the real life context - the why? \$\endgroup\$ Commented Dec 7, 2021 at 9:34

1 Answer 1

2
\$\begingroup\$

Option Explicit

Adding Option Explicit to the first line of your modules will force you to declare your variables. Always declare your variables!

getDimension()

Use arrk.CountLarge instead of this function.

    If arrk.CountLarge = 1 Then
       
    Else
        
    End If

union

Avoid naming User Defined Functions after built in functions.

Dim res As Variant
ReDim res(1 To 100000)

res could initialized when it was declared because it is never resized.

Dim res(1 To 100000) As Variant

Refactored Code

Rem Using a Collection
Function Union1(ParamArray Args() As Variant) As Variant
    Dim Map As New Collection
    Dim Item As Variant
    Dim r As Long
    On Error Resume Next
    For Each Item In Args
        If Item.CountLarge > 1 Then
            For r = 1 To Item.Rows.Count
                Map.Add Item(r, 1).Value, Item(r, 1).Text
            Next
        Else
            Map.Add Item.Value, Item.Value
        End If
    Next
    On Error GoTo 0
    
    If Map.Count = 0 Then Exit Function
    
    Dim Results() As Variant
    ReDim Results(1 To Map.Count, 1 To 1)
    

    For r = 1 To Map.Count
        Results(r, 1) = Map.Item(r)
    Next
    
    Union1 = WorksheetFunction.TextJoin(",", True, Results)
        
End Function

Rem Using an ArrayList
Function Union2(ParamArray Args() As Variant) As Variant
    Dim List As Object
    Set List = CreateObject("System.Collections.ArrayList")
    
    Dim r As Long
    For Each Item In Args
        If IsArray(Item) Then
            For r = 1 To Item.Rows.Count
                If Not List.Contains(Item(r, 1).Value) Then List.Add Item(r, 1).Value
            Next
        Else
            If Not List.Contains(Item.Value) Then List.Add Item.Value
        End If
    Next
  
    If List.Count = 0 Then Exit Function

    Union2 = WorksheetFunction.Transpose(List.ToArray)
        
End Function

Rem Using a Scripting.Dictionary
Function Union3(ParamArray Args() As Variant) As Variant
    Dim List As Object
    Set List = CreateObject("Scripting.Dictionary")
    
    Dim r As Long
    For Each Item In Args
        If IsArray(Item) Then
            For r = 1 To Item.Rows.Count
                If Not List.Exists(Item(r, 1).Value) Then List.Add Item(r, 1).Text, Item(r, 1).Value
            Next
        Else
            If Not List.Exists(Item.Value) Then List.Add Item.Text, Item.Value
        End If
    Next
  
    If List.Count = 0 Then Exit Function
    
    Dim Res
    Res = WorksheetFunction.Transpose(List.Items)
    
    Union3 = WorksheetFunction.Transpose(List.Items)
        
End Function
\$\endgroup\$
2
  • \$\begingroup\$ It works, and is much faster. Thank you. \$\endgroup\$ Commented Dec 8, 2021 at 7:18
  • \$\begingroup\$ Thanks for accepting my answer @SoftTimur \$\endgroup\$ Commented Dec 8, 2021 at 22:53

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.