Python pandas deduplicate data frame based on one column [duplicate]

Question

I have a data frame (dfCust) like so:

|cust_key|first_name|last_name|address        |
-----------------------------------------------
|12345   |John      |Doe      |123 Some street|
|12345   |John      |Doe      |123 Some st    |
|67890   |Jane      |Doe      |456 Some street|

and I would like to basically remove duplicate records such that the cust_key field is unique. I do not care about the record that is dropped, at the point that this happens, the addresses have already been deduplicated so the only ones that trickle through are spelling errors. I would like the following resulting dataframe:

|cust_key|first_name|last_name|address        |
-----------------------------------------------
|12345   |John      |Doe      |123 Some street|
|67890   |Jane      |Doe      |456 Some street|

in R this would basically be done like this:

dfCust <- unique(setDT(dfCust), by = "cust_key")

but I need a way to do this in pandas.

df.drop_duplicates('cust_key') for dropping duplicates based on a single col: cust_key — anky
– anky, Commented Jan 8, 2020 at 16:51
perfect, thank you. I knew it was something small I was missing. If you put this into an answer I'll upvote and accept! — DBA108642
– DBA108642, Commented Jan 8, 2020 at 16:52
That's okay, its a dupe: check this: stackoverflow.com/questions/50885093/… — anky
– anky, Commented Jan 8, 2020 at 16:54

Bhosale Shrikant · Accepted Answer · 2020-01-08 16:58:33Z

4

df.drop_duplicates(subset='cust_key')

answered Jan 8, 2020 at 16:58

Bhosale Shrikant

4733 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bhosale Shrikant Over a year ago

if the Dataframes are separate then it needs to be concatinated

Collectives™ on Stack Overflow

Python pandas deduplicate data frame based on one column [duplicate]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related