1

I have a dataframe like:

   Name  A   B   C
0  Sen   1   0   NaN
1  Kes   0   1   0
2  Pas   0   0   1
3  Sen   0   0   NaN
4  Pas   0   0   2

I would like to drop duplicated for each column individually with a rule:

Name column is the key.

For example Sen is duplicated, but its value is changing only in A, for B & C its value is the same. So for A i would like to do an OR operation and retain Sen A's value as 1 and in the other row it should populate 'Nan'.

Basically i dont want to drop the entire row for duplication, but rather modify values inside each column for all columns.

Expected output:

   Name  A     B   C
0  Sen   1     0   NaN
1  Kes   0     1   0
2  Pas   0     0   Nan
3  Sen   Nan   0   NaN
4  Pas   0     0   2

1 Answer 1

1

We can do groupby+max with where

s=df.groupby('Name').max().reindex(df.Name).values
df.drop('Name',1).where(df.drop('Name',1)==s)
     A  B    C
0  1.0  0  NaN
1  0.0  1  0.0
2  0.0  0  NaN
3  NaN  0  NaN
4  0.0  0  2.0
#df.loc[:,'A':]=df.drop('Name',1).where(df.drop('Name',1)==s)
Sign up to request clarification or add additional context in comments.

7 Comments

but what if i want to retain the 'Name' column?
@hakuna_code noticed my mask line ? df.loc[:,'A':]=df.drop('Name',1).where(df.drop('Name',1)==s)
df.drop('Name',1).where(df.drop('Name',1)==s) fails with an error 'cannot broadcast shape [(8007, 10)] with block values [(8007, 11)] ' ... i have columns from A to K
@hakuna_code what error , and did you try with your sample ?
yea.. but higher dimesion its failing... but ideally it sud work the same way right?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.