Pandas- Remove duplicate values in each column

Question

I have the following dataset:

   **Fruit  Animal  Color    City** 
    Apple   Dog     Yellow   Paris
    Apple   Dog     Blue     Paris
    Orange  Dog     Green    Paris
    Grape   Dog     Pink     Paris
    Orange  Dog     Grey     NY
    Peach   Dog     Purple   Rome

I would like to use pandas to remove the duplicate data in each column (not the entire row).

Example of output:

**Fruit     Animal  Color    City** 
    Apple   Dog     Yellow   Paris
    Grape           Paris    NY
    Orange          Green    Rome
    Peach           Pink     
                    Grey     
                    Purple

Regards,

BENY · Accepted Answer · 2020-06-16 20:12:43Z

1

We can do unique

s=df.T.apply(pd.Series.unique,1)
newdf=pd.DataFrame(s.tolist(),index=s.index).T
newdf
Out[57]: 
  **Fruit Animal   Color City**
0   Apple    Dog  Yellow  Paris
1  Orange   None    Blue     NY
2   Grape   None   Green   Rome
3   Peach   None    Pink   None
4    None   None    Grey   None
5    None   None  Purple   None

answered Jun 16, 2020 at 20:12

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Terry · Accepted Answer · 2020-06-16 20:20:49Z

0

you can try column by column using drop_duplicates:

for x in df.columns:
    df[x] = df[x].drop_duplicates().reset_index(drop=True)
#output:
    Fruit   Animal  Color   City
0   Apple   Dog     Yellow  Paris
1   Orange  NaN     Blue    NY
2   Grape   NaN     Green   Rome
3   Peach   NaN     Pink    NaN
4   NaN     NaN     Grey    NaN
5   NaN     NaN     Purple  NaN

answered Jun 16, 2020 at 20:20

Terry

2,8212 gold badges16 silver badges30 bronze badges

Collectives™ on Stack Overflow

Pandas- Remove duplicate values in each column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related