Using Office365 excel array formulas, how to remove duplicates, keeping the last value?

Question

With data in A1 - B5:

A 1  //remove
A 2
B 3  //remove
B 2
C 1

How do you remove duplicates in column A, keeping the last set of values in other columns? Results should look like this:

A 2
B 2
C 1

I've tried combinations of Filter, Unique, and xlookup but haven't found an approach that works yet.

Why not use XLOOKUP() with UNIQUE() --> =LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, XLOOKUP(δ,α,B1:B5,,,-1)))? Will it be slow as well? or why not use LOOKUP() --> =LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, LOOKUP(δ,α,B1:B5))) — Mayukh Bhattacharya
– Mayukh Bhattacharya, Commented Apr 26, 2024 at 3:48

Mayukh Bhattacharya · Accepted Answer · 2024-04-27 14:02:38Z

6

There are many possibilities of doing this, I assume these are two more methods which one can apply, although I have not made any speed test yet.

Method One:

=LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, LOOKUP(δ,α,B1:B5)))

Method Two:

=LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, XLOOKUP(δ,α,B1:B5,,,-1)))

Method Three: (using a reverse binary xlookup for speed)

=LET(d,SORT(A1:B5,,-1),  a,CHOOSECOLS(d,1),  b,CHOOSECOLS(d,2),  u,SORT(UNIQUE(a)),
    HSTACK(u,XLOOKUP(u,a,b,"",0,-2))  )

edited Apr 27, 2024 at 14:02

user2847853

answered Apr 26, 2024 at 4:03

Mayukh Bhattacharya

29.2k9 gold badges31 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user2847853 Over a year ago

Very fast. Used method two since it uses xlookup, but changed to use a reverse binary lookup by sorting the data in reverse to begin with, but kept the unique variable sort ascending so the results come out ascending. This method is simple, fast, and intuitive. Thank yoU!!

user2847853 Over a year ago

For posterity... Binary modification to the answer: =LET(d,SORT(A1:B5),,-1),a,CHOOSECOLS(d,1),b,CHOOSECOLS(d,2),u,SORT(UNIQUE(a)),HSTACK(u,XLOOKUP(u,a,b,"",0,-2)))

Mayukh Bhattacharya Over a year ago

@ciso if you wish you can edit the answer to put up your idea there. Also what I see you have wrong syntax for sort function the bracket after b5 shouldn't be there, but I got your formula might be a typo. Also I will prefer index instead of choosecols

user2847853 Over a year ago

Why do you prefer index instead of choosecols?

Mayukh Bhattacharya Over a year ago

@ciso it is shorter in comparison to the other.

BigBen · Accepted Answer · 2024-04-26 17:30:17Z

5

There are likely many options. Here's one using XMATCH to search last-to-first.

=CHOOSEROWS(A1:B5,UNIQUE(XMATCH(A1:A5,A1:A5,0,-1)))

EDIT:

Similar option: calculate the UNIQUE items first (thanks to @ScottCraner):

=CHOOSEROWS(A1:B5,XMATCH(UNIQUE(A1:A5),A1:A5,0,-1))

edited Apr 26, 2024 at 17:30

answered Apr 25, 2024 at 20:28

BigBen

50.4k7 gold badges29 silver badges45 bronze badges

8 Comments

user2847853 Over a year ago

This works, but running this on 50,000 rows, it takes a long time to process. Is there are more efficient way?

BigBen Over a year ago

Possibly. I suspect this thread will turn into a bit of Code Golf.

Scott Craner Over a year ago

=CHOOSEROWS(A1:B5,TAKE(GROUPBY(A1:A5,ROW(B1:B5),MAX,0,0),,-1)) NOt sure if quicker or not.

user2847853 Over a year ago

Still very slow when there are only a few (less than 1%) duplicates... but it does work though. Would be great if it could also work on 500,000 rows.

Scott Craner Over a year ago

I doubt any formula is going to work quickly on that many rows with that number of duplicates. It may be better to use VBA and a dictionary. But even then, using arrays, one would still be looping. @ciso

|

Scott Craner · Accepted Answer · 2024-04-25 21:01:25Z

3

Another play on CHOOSEROWS is to use GROUPBY(at time of writing only available to insiders):

=CHOOSEROWS(A1:B5,TAKE(GROUPBY(A1:A5,SEQUENCE(ROWS(B1:B5)),MAX,0,0),,-1))

edited Apr 25, 2024 at 21:01

answered Apr 25, 2024 at 20:49

Scott Craner

153k10 gold badges52 silver badges88 bronze badges

4 Comments

user2847853 Over a year ago

Can't get this to work. What is "MAX"? the formula listed probably doesn't match what's in your spreadsheet (can't see fully).

Scott Craner Over a year ago

What do you get in return? I assume you are getting the #Name error. I thought they had released it to everyone, but I may be wrong and it is only available to insiders at present.

user2847853 Over a year ago

Yes, #Name error. Not an insider.

Scott Craner Over a year ago

Ah, Well I will put that caveat in the answer, sorry.

MisterCary · Accepted Answer · 2024-04-26 04:52:26Z

2

Here's an interesting method:

=LET(
    a, GROUPBY(A1:A5, B1:B5, ARRAYTOTEXT, , 0),
    HSTACK(CHOOSECOLS(a, 1), --RIGHT(CHOOSECOLS(a, 2)))
)

answered Apr 26, 2024 at 4:52

MisterCary

6554 silver badges8 bronze badges

1 Comment

Mayukh Bhattacharya Over a year ago

Sir this one, using RIGHT() will not return if the values in Column B is double or triple digits. =LET(a, GROUPBY(A1:A5,B1:B5,ARRAYTOTEXT,,0), HSTACK(TAKE(a,,1), --TEXTAFTER(", "&TAKE(a,,-1),", ",-1)))

VBasic2008 · Accepted Answer · 2024-04-26 06:28:23Z

2

Bottom Unique

I'm not sure what you mean by BigBen's formulas being slow.
Here is a more dynamic version of the second one which runs in under a second for 500k rows on my old Windows 64-bit and Office 64-bit configuration. You cannot seriously expect it to run faster, I mean, =SEQUENCE(500000) takes nearly half a second.

=LET(data,A2:F500001,unique_col,1,
    du,INDEX(data,,unique_col),
    u,UNIQUE(du),
    CHOOSEROWS(data,XMATCH(u,du,,-1)))

The data was =RANDARRAY(500000,,1000,9999,1) in A2 and =SEQUENCE(500000) in B2, then copy/pasted values and copied all of it to C2:F2. This resulted in the expected 9000 unique rows (1.8%).
An interesting observation was that if INDEX(data,,unique_col) is replaced with CHOOSECOLS(data,unique_col) it takes 2 seconds (TAKE(data,,unique_col) performs the same).
Maybe the latter or some other formula in your workbook is the reason behind a bad performance. Share some feedback.

answered Apr 26, 2024 at 6:28

VBasic2008

56.8k5 gold badges21 silver badges38 bronze badges

1 Comment

user2847853 Over a year ago

For 50,000 rows, BigBen's took about a minute to come back. Mayukh's about one second.

Collectives™ on Stack Overflow

Using Office365 excel array formulas, how to remove duplicates, keeping the last value?

5 Answers 5

5 Comments

8 Comments

4 Comments

1 Comment

Bottom Unique

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

8 Comments

4 Comments

1 Comment

Bottom Unique

1 Comment

Related