6

With data in A1 - B5:

A 1  //remove
A 2
B 3  //remove
B 2
C 1

How do you remove duplicates in column A, keeping the last set of values in other columns? Results should look like this:

A 2
B 2
C 1

I've tried combinations of Filter, Unique, and xlookup but haven't found an approach that works yet.

1
  • Why not use XLOOKUP() with UNIQUE() --> =LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, XLOOKUP(δ,α,B1:B5,,,-1)))? Will it be slow as well? or why not use LOOKUP() --> =LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, LOOKUP(δ,α,B1:B5))) Commented Apr 26, 2024 at 3:48

5 Answers 5

6

There are many possibilities of doing this, I assume these are two more methods which one can apply, although I have not made any speed test yet.

enter image description here


Method One:

=LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, LOOKUP(δ,α,B1:B5)))

Method Two:

=LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, XLOOKUP(δ,α,B1:B5,,,-1)))

Method Three: (using a reverse binary xlookup for speed)

=LET(d,SORT(A1:B5,,-1),  a,CHOOSECOLS(d,1),  b,CHOOSECOLS(d,2),  u,SORT(UNIQUE(a)),
    HSTACK(u,XLOOKUP(u,a,b,"",0,-2))  ) 

Sign up to request clarification or add additional context in comments.

5 Comments

Very fast. Used method two since it uses xlookup, but changed to use a reverse binary lookup by sorting the data in reverse to begin with, but kept the unique variable sort ascending so the results come out ascending. This method is simple, fast, and intuitive. Thank yoU!!
For posterity... Binary modification to the answer: =LET(d,SORT(A1:B5),,-1),a,CHOOSECOLS(d,1),b,CHOOSECOLS(d,2),u,SORT(UNIQUE(a)),HSTACK(u,XLOOKUP(u,a,b,"",0,-2)))
@ciso if you wish you can edit the answer to put up your idea there. Also what I see you have wrong syntax for sort function the bracket after b5 shouldn't be there, but I got your formula might be a typo. Also I will prefer index instead of choosecols
Why do you prefer index instead of choosecols?
@ciso it is shorter in comparison to the other.
5

There are likely many options. Here's one using XMATCH to search last-to-first.

=CHOOSEROWS(A1:B5,UNIQUE(XMATCH(A1:A5,A1:A5,0,-1)))

enter image description here

EDIT:

Similar option: calculate the UNIQUE items first (thanks to @ScottCraner):

=CHOOSEROWS(A1:B5,XMATCH(UNIQUE(A1:A5),A1:A5,0,-1))

8 Comments

This works, but running this on 50,000 rows, it takes a long time to process. Is there are more efficient way?
Possibly. I suspect this thread will turn into a bit of Code Golf.
=CHOOSEROWS(A1:B5,TAKE(GROUPBY(A1:A5,ROW(B1:B5),MAX,0,0),,-1)) NOt sure if quicker or not.
Still very slow when there are only a few (less than 1%) duplicates... but it does work though. Would be great if it could also work on 500,000 rows.
I doubt any formula is going to work quickly on that many rows with that number of duplicates. It may be better to use VBA and a dictionary. But even then, using arrays, one would still be looping. @ciso
|
3

Another play on CHOOSEROWS is to use GROUPBY(at time of writing only available to insiders):

=CHOOSEROWS(A1:B5,TAKE(GROUPBY(A1:A5,SEQUENCE(ROWS(B1:B5)),MAX,0,0),,-1))

enter image description here

4 Comments

Can't get this to work. What is "MAX"? the formula listed probably doesn't match what's in your spreadsheet (can't see fully).
What do you get in return? I assume you are getting the #Name error. I thought they had released it to everyone, but I may be wrong and it is only available to insiders at present.
Yes, #Name error. Not an insider.
Ah, Well I will put that caveat in the answer, sorry.
2

Here's an interesting method:

=LET(
    a, GROUPBY(A1:A5, B1:B5, ARRAYTOTEXT, , 0),
    HSTACK(CHOOSECOLS(a, 1), --RIGHT(CHOOSECOLS(a, 2)))
)

enter image description here

1 Comment

Sir this one, using RIGHT() will not return if the values in Column B is double or triple digits. =LET(a, GROUPBY(A1:A5,B1:B5,ARRAYTOTEXT,,0), HSTACK(TAKE(a,,1), --TEXTAFTER(", "&TAKE(a,,-1),", ",-1)))
2

Bottom Unique

  • I'm not sure what you mean by BigBen's formulas being slow.
  • Here is a more dynamic version of the second one which runs in under a second for 500k rows on my old Windows 64-bit and Office 64-bit configuration. You cannot seriously expect it to run faster, I mean, =SEQUENCE(500000) takes nearly half a second.
=LET(data,A2:F500001,unique_col,1,
    du,INDEX(data,,unique_col),
    u,UNIQUE(du),
    CHOOSEROWS(data,XMATCH(u,du,,-1)))
  • The data was =RANDARRAY(500000,,1000,9999,1) in A2 and =SEQUENCE(500000) in B2, then copy/pasted values and copied all of it to C2:F2. This resulted in the expected 9000 unique rows (1.8%).
  • An interesting observation was that if INDEX(data,,unique_col) is replaced with CHOOSECOLS(data,unique_col) it takes 2 seconds (TAKE(data,,unique_col) performs the same).
  • Maybe the latter or some other formula in your workbook is the reason behind a bad performance. Share some feedback.

1 Comment

For 50,000 rows, BigBen's took about a minute to come back. Mayukh's about one second.