42

Is there a faster way to do this?

Set data = ws.UsedRange

Set unique = CreateObject("Scripting.Dictionary")

On Error Resume Next
For x = 1 To data.Rows.Count
    unique.Add data(x, some_column_number).Value, 1
Next x
On Error GoTo 0

At this point unique.keys gets what I need, but the loop itself seems to be very slow for files that have tens of thousands of records (whereas this wouldn't be a problem at all in a language like Python or C++ especially).

0

5 Answers 5

42

Use Excel's AdvancedFilter function to do this.

Using Excels inbuilt C++ is the fastest way with smaller datasets, using the dictionary is faster for larger datasets. For example:

Copy values in Column A and insert the unique values in column B:

Range("A1:A6").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("B1"), Unique:=True

It works with multiple columns too:

Range("A1:B4").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("D1:E1"), Unique:=True

Be careful with multiple columns as it doesn't always work as expected. In those cases I resort to removing duplicates which works by choosing a selection of columns to base uniqueness. Ref: MSDN - Find and remove duplicates

enter image description here

Here I remove duplicate columns based on the third column:

Range("A1:C4").RemoveDuplicates Columns:=3, Header:=xlNo

Here I remove duplicate columns based on the second and third column:

Range("A1:C4").RemoveDuplicates Columns:=Array(2, 3), Header:=xlNo
Sign up to request clarification or add additional context in comments.

8 Comments

2 issues. #1 - this is pasting data into sheet, as opposed to saving it in a VBA variable. #2 - It is looking at formulas instead of actual values (for me, instead of pasting unique values in the column, it only pasted one common formula into one cell).
@ZygD 1. The Range is a variable, you use it in VBA. 2. You're doing it wrong, use Paste As Value instead of doing it on formula's.
1. Even though technically speaking Range is a variable... but the thing is, you cannot use AdvancedFilter method to put data to only a "VBA-visible" variable like array or dictionary (i.e. without "physical" presence in the worksheet). 2. I cannot find how to do it, does this AdvancedFilter method really provide the option to paste as values?
Issue #3 - this method removes the original filter in the sheet, if it existed.
AdvancedFilter is not the fastest way. On large data sets, using a dictionary will outperform and by far AdvancedFilter (~500ms vs ~60sec for 100k cells).
|
35

Loading the values in an array would be much faster:

Dim data(), dict As Object, r As Long
Set dict = CreateObject("Scripting.Dictionary")

data = ActiveSheet.UsedRange.Columns(1).Value

For r = 1 To UBound(data)
    dict(data(r, some_column_number)) = Empty
Next

data = WorksheetFunction.Transpose(dict.keys())

You should also consider early binding for the Scripting.Dictionary:

Dim dict As New Scripting.Dictionary  ' requires `Microsoft Scripting Runtime` '

Note that using a dictionary is way faster than Range.AdvancedFilter on large data sets.

As a bonus, here's a procedure similare to Range.RemoveDuplicates to remove duplicates from a 2D array:

Public Sub RemoveDuplicates(data, ParamArray columns())
    Dim ret(), indexes(), ids(), r As Long, c As Long
    Dim dict As New Scripting.Dictionary  ' requires `Microsoft Scripting Runtime` '

    If VarType(data) And vbArray Then Else Err.Raise 5, , "Argument data is not an array"

    ReDim ids(LBound(columns) To UBound(columns))

    For r = LBound(data) To UBound(data)         ' each row '
        For c = LBound(columns) To UBound(columns)   ' each column '
            ids(c) = data(r, columns(c))                ' build id for the row
        Next
        dict(Join$(ids, ChrW(-1))) = r  ' associate the row index to the id '
    Next

    indexes = dict.Items()
    ReDim ret(LBound(data) To LBound(data) + dict.Count - 1, LBound(data, 2) To UBound(data, 2))

    For c = LBound(ret, 2) To UBound(ret, 2)  ' each column '
        For r = LBound(ret) To UBound(ret)      ' each row / unique id '
            ret(r, c) = data(indexes(r - 1), c)   ' copy the value at index '
        Next
    Next

    data = ret
End Sub

17 Comments

You need to add the reference "Microsoft Scripting Runtime"
I have it added already. It can't seem to find ".Dictionary" of "Scripting"
Doesn't seem to matter though, it runs in a blink of an eye even with late binding. Why is that code so much faster than what I have?
Reading cell by cell is slow with excel. It's faster to load the data in an array and write them back if necessary.
@MGae2M, use .Keys() on the dictionary to get the unique values in an array.
|
12

PowerShell is a very powerful and efficient tool. This is cheating a little, but shelling PowerShell via VBA opens up lots of options

The bulk of the code below is simply to save the current sheet as a csv file. The output is another csv file with just the unique values

Sub AnotherWay()
Dim strPath As String
Dim strPath2 As String

Application.DisplayAlerts = False
strPath = "C:\Temp\test.csv"
strPath2 = "C:\Temp\testout.csv"
ActiveWorkbook.SaveAs strPath, xlCSV
x = Shell("powershell.exe $csv = import-csv -Path """ & strPath & """ -Header A | Select-Object -Unique A | Export-Csv """ & strPath2 & """ -NoTypeInformation", 0)
Application.DisplayAlerts = True

End Sub

1 Comment

This is brilliant!
12

it's funny because i've had to read these instructions over and over again, but it think i worked out a much faster way to do this:

Set data = ws.UsedRange
dim unique as variant
unique = WorksheetFunction.Unique(data)

And then you can do whatever you want with the unique array such as iterating it:

For i = LBound(unique) To UBound(unique)
    Range("Q" & i) = indexes(i, 1)
Next

6 Comments

This function is only available in Office 365 I think
Thank you, With this solution you can return an array of unique values in a column with only one line of code: unique = WorksheetFunction.unique(Columns(1))
that's one powerful code: wish I'd known this earlier :)
@PatrickHonorez It may become available after 2019 versions of Office, but as I need my code to be backwards compatible, this is a no go, to bad, looks super powerful!
What is indexes? Should that be unique(i, 1) ??
|
1

Try this

Option Explicit

Sub UniqueValues()
Dim ws As Worksheet
Dim uniqueRng As Range
Dim myCol As Long

myCol = 5 '<== set it as per your needs
Set ws = ThisWorkbook.Worksheets("unique") '<== set it as per your needs

Set uniqueRng = GetUniqueValues(ws, myCol)

End Sub


Function GetUniqueValues(ws As Worksheet, col As Long) As Range
Dim firstRow As Long

With ws
    .Columns(col).RemoveDuplicates Columns:=Array(1), header:=xlNo

    firstRow = 1
    If IsEmpty(.Cells(1, col)) Then firstRow = .Cells(1, col).End(xlDown).row

    Set GetUniqueValues = Range(.Cells(firstRow, col), .Cells(.Rows.Count, col).End(xlUp))
End With

End Function

it should be quite fast and without the drawback NeepNeepNeep told about

1 Comment

Good method. But one should be careful that this modifies the original source column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.