Replace value from a column based on condition of another column, Pandas

Question

Starting DataFrame

df = pd.DataFrame({'Column A' : ['red','green','yellow', 'orange', 'red', 'blue'],
                   'Column B' : [NaN, 'blue', 'purple', NaN, NaN, NaN],
                   'Column C' : [1, 2, 3, 2, 3, 7]})

Column A	Column B	Column C
'red'	NaN	1
'green'	'blue'	2
'yellow'	'purple'	3
'orange'	NaN	2
'red'	NaN	3
'blue'	NaN	7

Desired Result

Column A	Column B	Column C
'red'	NaN	1
'blue'	'blue'	2
'purple'	'purple'	3
'orange'	NaN	2
'red'	NaN	3
'blue'	NaN	7

I want to replace values in column A only if the value in Column B is not NaN, and to replace column A with the value in Column B

So that I can run the following code:

df[[Column_A, Column_C]].groupby(Column_A).sum()

Which would result in the following DataFrame:

Column A	Column C
'red'	4
'blue'	9
'purple'	3
'orange'	2

I am trying to replace categories before doing a groupby call.

Attempts:

The DataFrame I am working with has a sequential numerical based index going from 0 to N.
So I could hard code the following:
df.iloc[[index], column] = some_string
I do not want to do this as it is not dynamic and the DataFrame data could change.

I believe I could use .agg() or .apply() on either the df or the df.groupby() but this is where I have struggled.

Particularly with how to write a function to use with .agg() or .apply()

Say:

def my_func(x):
    print(x)

Then:
df.apply(my_func)
The result is the first column of df printed.
Or:
df.apply(my_func, axis = 1)

The result is the following format for each row:

Column A    red
Column B    Nan
Column C    1
Name: 0, dtype: object
Column A    green
Column B    blue
Column C    2
Name: 1, dtype: object

I am not sure how to access each column per row in my_func.

Edit:
I am trying to find a way to change the value in Column A if the value, for that row, in Column B is not NaN. The value to use for replacing is the value in Column B, the value to replace is the value in Column A if Column B is not NaN.

But I want to do this dynamically, meaning not hardcoded as I showed with:
df.iloc[[index], column] = some_string

Your question is not clear, do you want only the first step? df.loc[df['Column B'].notna(), 'Column A'] = df['Column B'] — mozway
– mozway, Commented Nov 18, 2022 at 18:41
@mozway I did an edit, apologies for not being clear. And thanks for your help. — James Adams
– James Adams, Commented Nov 18, 2022 at 19:03
Not sure what you mean by "dynamic", maybe df.assign(**{'Column A': lambda d: d['Column A'].mask(d['Column B'].notna(), d['Column B'])}) or df.assign(**{'Column A': lambda d: d['Column B'].fillna(d['Column A'])}) — mozway
– mozway, Commented Nov 18, 2022 at 19:34

Yolao_21 · Accepted Answer · 2022-11-19 13:52:17Z

1

As you mentioned, you could use pd.apply like this:

df['Column A'] = df.apply(lambda x: x['Column B'] if str(x['Column B']) not in ['nan', 'NaN'] else x['Column A'], axis=1)

  Column A Column B  Column C
0      red      NaN         1
1     blue     blue         2
2   purple   purple         3
3   orange      NaN         2
4      red      NaN         3
5     blue      NaN         7

Notice that apply is not fast at for very large dataset is not advisable. There are some good answers out there for alternative methods

answered Nov 19, 2022 at 13:52

Yolao_21

8951 gold badge6 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replace value from a column based on condition of another column, Pandas

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related