Delete duplicate arrays based on column values across several blocks of data

Question

I have been working on automating different parts of the process of formatting a very large data set. I am stuck on trying to automate the "remove duplicates" command across all blocks of my data:

I have blocks of data (9 columns wide, x rows long) as on the image attached. In the column called "#Point ID" are values 0-n. Some values appear once, some values appear more than once. Different blocks have different "#Point ID" columns

I would like to delete all rows in the block where the value in the "#Point ID" column has already occurred (starting from the top, moving down the rows). I would like the deleted rows removed from the blocks, so only the rows (which are blue on the image) with unique values in "#Point ID" column (green on the image) remain.

I have found VBA modules that work on a single block, but I don't know how to make it function across all my blocks. Delete rows in Excel based on duplicates in Column

I have also tried combinations of functions (inc. UNIQUE and SORTBY) without any success.

What's a function or a VBA module that works?

Likely it would be easier to remove the duplicates before/while splitting your data into multiple blocks, instead of after the split. — Tim Williams
– Tim Williams, Commented Feb 10, 2023 at 23:14

wrbp · Accepted Answer · 2023-02-11 01:39:47Z

2

Use this


Public Sub cleanBlock(rng As Range)
    Dim vals As Object
    Set vals = CreateObject("Scripting.Dictionary")
    Dim R As Range
    
    Dim adds As Range
    For Each R In rng.Rows
        If (vals.exists(R.Cells(1, 2).Value)) Then
            If adds Is Nothing Then
                Set adds = R
            Else
                Set adds = Union(adds, R)
            End If
            
        Else
            vals(R.Cells(1, 2).Value) = True
        End If
    Next R
    Debug.Print (adds.Address)
    If Not adds Is Nothing Then adds.Delete shift:=xlUp
    Set vals = Nothing

End Sub



Public Sub test()

    cleanBlock Range("b3:j20")
    cleanBlock Range("l3:t20")
    cleanBlock Range("y3:ad20")

End Sub

answered Feb 11, 2023 at 1:39

wrbp

8901 gold badge5 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Anna Over a year ago

Hi @wrbp, thank you so much for your code! This is a very dumb question, but I cannot figure out how to make this macro show up when I try to run it. I added your sub under the correct Workbook. Other Subs show up when I want to run them, but they are not declared Private/Public.

wrbp Over a year ago

Anna, does the test macro does not appear?

Anna Over a year ago

Hi, thank you for your message. If I add a new macro and paste your code in as it is, it does not appear when I try to call it. When Instead your first line "Public Sub cleanBlock(rng As Range)" I write "Sub cleanBlock ()" and declare "rng As Range", cleanBlock appears, but when I run it I get an error on the line "For Each R In rng.Rows". I am a beginner at this, so I have probably made a very dumb mistake somewhere.

wrbp Over a year ago

See my code has 2 procedures, cleanBlock which has 1 parameter and a second procedure "test" that calls "cleanBlock" 3 times, one time for each block to be cleaned, did you paste both procedures? if you are not calling the cleanBlock from another procedure , then call the "test" procedure instead of the cleanBlock procedure. Modifiy the data ranges on test procedure as you require

Anna Over a year ago

hi, thank you for the explanation! got it! "test" macro appears and runs. I thought that "cleanBlock" macro itself should appear in the macros selection as well.

VBasic2008 · Accepted Answer · 2023-02-11 16:23:11Z

Remove Duplicates in Areas of a Range

Sub RemoveDupesByAreas()

    Dim wb As Workbook: Set wb = ThisWorkbook ' workbook containing this code
    Dim ws As Worksheet: Set ws = wb.Sheets("Sheet1") ' adjust!
    Dim rg As Range: Set rg = ws.UsedRange.SpecialCells(xlCellTypeConstants)
    
    Dim aCount As Long: aCount = rg.Areas.Count
    
    Dim arg As Range, a As Long
    
    For a = aCount To 1 Step -1
        Set arg = rg.Areas(a)
        Debug.Print a, arg.Address(0, 0)
        ' Before running the code with the next line, in the Immediate
        ' window ('Ctrl+G'), carefully check if the range addresses
        ' match the areas of your data. If they match, uncomment
        ' the following line to apply remove duplicates.
        'arg.RemoveDuplicates 2, xlYes
    Next a
    
    MsgBox "Duplicates removed.", vbInformation

End Sub

Find and FindNext feat. CurrentRegion

Sub RemoveDupesByFind()

    Const SEARCH_STRING As String = "Source.Name"

    Dim wb As Workbook: Set wb = ThisWorkbook ' workbook containing this code
    Dim ws As Worksheet: Set ws = wb.Sheets("Sheet1") ' adjust!
    Dim rg As Range: Set rg = ws.UsedRange
    
    Dim fCell As Range: Set fCell = rg.Find( _
        SEARCH_STRING, , xlFormulas, xlWhole, xlByRows, xlPrevious)
    
    If fCell Is Nothing Then
        MsgBox """" & SEARCH_STRING & """ not found.", vbCritical
        Exit Sub
    End If
    
    Dim FirstAddress As String: FirstAddress = fCell.Address
    
    Do
        fCell.CurrentRegion.RemoveDuplicates 2, xlYes
        Set fCell = rg.FindNext(fCell)
    Loop Until fCell.Address = FirstAddress
    
    MsgBox "Duplicates removed.", vbInformation

End Sub

karma · Accepted Answer · 2023-02-11 16:26:32Z

1

Another way, maybe something like this :

Sub test()
Dim rgData As Range
Dim rg As Range: Dim cell As Range
Dim rgR As Range: Dim rgDel As Range

Set rgData = Sheets("Sheet1").UsedRange 'change as needed
Set rgData = rgData.Resize(rgData.Rows.Count - 1, rgData.Columns.Count).Offset(1, 0)

For Each rg In rgData.SpecialCells(xlConstants).Areas
    For Each cell In rg.Columns(2).Cells
        Set rgR = cell.Offset(0, -1).Resize(1, rg.Columns.Count)
        If cell.Value = 0 And cell.Offset(1, 0).Value <> 0 And cell.Offset(0, 1).Value = 0 And cell.Address = rg.Columns(2).Cells(1, 1).Address Then
        Else
        If Application.CountIf(rg.Columns(2), cell.Value) > 1 And cell.Offset(0, 1).Value = 0 Then
            If rgDel Is Nothing Then Set rgDel = rgR Else Set rgDel = Union(rgDel, rgR)
        End If
        End If
    Next cell
Next rg

rgDel.Delete Shift:=xlUp

End Sub

The code assumed that there'll be no blank cell within each block and there will be full blank column (no value at all) between each block. So it sets the usedrange as rgData variable, and loop to each area/block in rgData as rg variable.

Within rg, it loop to each cell in rg column 2, and check if the count of the looped cell value is > 1 and the value of the looped cell.offset(0,1) is zero, then it collect the range as rgDel variable.

Then finally it delete the rgDel.

If you want to step run the code, try to add something like this rg.select ... rgR.select .... after the variable is set. For example, add rgDel.select right before next area, so you can see what's going on.

The code assume that :

the first value right under "#Point" in each block will be always zero. It will never happen that the value is other than zero.
the next value (after that zero value) is maybe zero again or maybe one.
if there are duplicates (two same value) in column #Point then in column X, it's not fix that the first one will always have value and the second one will always zero value.

If the data is always fix that the first one will always have value and the second one will always zero value (if there are duplicate), I suggest you to use Mr. VBasic2008 or Mr. wrbp answer. Thank you.

edited Feb 11, 2023 at 16:26

answered Feb 11, 2023 at 4:55

karma

2,0091 gold badge12 silver badges17 bronze badges

5 Comments

Anna Over a year ago

Hi @karma! Thank you so much your code! It works. The only issue occurred when the block starts with a row that has zeros in each cell (zero also in the the second column "#Point ID") and the second row didn't have a zero in the second column ("#Point ID"). Meaning it should have kept the first row of all zeros, but it deleted it. I can see why it happens from the code. There is a workaround if I change the zero values of the Point IDs or would not include the rows that have zero values in the Point IDs.

karma Over a year ago

@Anna, please check the revised code in my answer. I add a condition special only to check the first cell of column2 in each block. So, it's something like this : if the looped cell address = the first cell of column2 address AND if the looped cell value is zero AND if the looped cell.offset(0,1) value is zero AND if the looped cell.offset(1,0) value is not zero ... then (do nothing)... else ... the same of the previous code. So, if all the four criteria meet then it won't keep it as rgDel. What do you think with this condition ?

Anna Over a year ago

hi, thank you so much for the update. unfortunately, the first row with all zeros still disappears. I have added screenshots to the original post as reference.

karma Over a year ago

@Anna, my mistake ... I test with my dummy data but without header. Sorry. Please see the revised code. The first setting for rgData is included the header. The second setting for rgData, it remove the header. I hope this time is correct. Please also read my last paragraph.

Anna Over a year ago

Thank you so so much!! This works. Yes, your third point (if duplicate then also X=0) is correct in my data.

Collectives™ on Stack Overflow

Delete duplicate arrays based on column values across several blocks of data

3 Answers 3

5 Comments

Remove Duplicates in Areas of a Range

Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Remove Duplicates in Areas of a Range

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related