How to change value in dataframe cell if there are two different values with the same key

Question

I am writting script in Python and I am looking for optimal solution of following problem:

I have big pandas dataframe (at least 100k rows) and if there are rows with the same value in col2 but different value in col3 then I want to change all values in col3 for A

For example:

----------------------
| col1 | col2 | col3 |
----------------------
|   a  |   1  |   A  |
----------------------
|   b  |   2  |   A  |
----------------------
|   c  |   2  |   B  |
----------------------
|   d  |   2  |   B  |
----------------------
|   e  |   3  |   B  |
----------------------
|   f  |   3  |   B  |
----------------------

should look like this:

----------------------
| col1 | col2 | col3 |
----------------------
|   a  |   1  |   A  |
----------------------
|   b  |   2  |   A  |
----------------------
|   c  |   2  |   A  |
----------------------
|   d  |   2  |   A  |
----------------------
|   e  |   3  |   B  |
----------------------
|   f  |   3  |   B  |
----------------------

I solved that problem by sorting dataframe over col2 and iterating over rows, whenever value in col2 changes and in "block" of the same col2 values are different values I change col3 value but this algorithm takes around 60s for 100k rows and I am looking for more sufficient answer.

the value would be updated to 'A' everytime or there is some rule to which value you want to update ? — tawab_shakeel
– tawab_shakeel, Commented Aug 8, 2019 at 7:59

jezrael · Accepted Answer · 2019-08-08 08:21:09Z

2

Use GroupBy.transform with DataFrameGroupBy.nunique for test number of unique values and set new values by condition in DataFrame.loc:

df.loc[df.groupby('col2')['col3'].transform('nunique') != 1, 'col3'] = 'A' 
print (df)
  col1  col2 col3
0    a     1    A
1    b     2    A
2    c     2    A
3    d     2    A
4    e     3    B
5    f     3    B

Details:

First check number of unique values per groups with transform for same size Series like original DataFrame:

print (df.groupby('col2')['col3'].transform('nunique'))
0    1
1    2
2    2
3    2
4    1
5    1
Name: col3, dtype: int64

And thentest for not equal:

print (df.groupby('col2')['col3'].transform('nunique') != 1)
0    False
1     True
2     True
3     True
4    False
5    False
Name: col3, dtype: bool

Last overwrite True rows by value A.

edited Aug 8, 2019 at 8:21

answered Aug 8, 2019 at 7:59

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Umar.H Over a year ago

beat me to it! :D

Collectives™ on Stack Overflow

How to change value in dataframe cell if there are two different values with the same key

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related