1

I want to change the NA values in multiple columns when certain condition is met. Example data set given below.

    Pool Area   Pool Quality    Pool Type   Pool Condition  Pool Finish
0   800             Good           A              Good        Gunite
1   400             Good           C              Good        Vinyl
2   485             Good           B              Good        Fibreglass
3   360             Poor           C              Poor        Vinyl
4   0               NaN            NaN            NaN         NaN
5   600             Best           A              Best        Gunite
6   500             Best           B              Best        Fibreglass
7   0               NaN            NaN            NaN         NaN
8   750             Best           A              Best        Gunite
9   900             Best           A              Best        Gunite
10  0               NaN            NaN            NaN         NaN
11  900             Best           A              Best        Gunite
12  400             Poor           C              Poor        Fibreglass
13  0               NaN            NaN            NaN         NaN

In the above data, I want to replace the NaN values with 'No Pool' where the 'Pool Area' column has value '0'.

I know I can do it with np.where function and I tried the below code.

df[['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']] = np.where(df['Pool Area']==0, 'No Pool', df[['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']])

It isn't working.

I tried it separately, it works (Ref to code below).

df['Pool Quality'] = np.where(df['Pool Area']==0, 'No Pool', df['Pool Quality'])

But when I tried to do for multiple columns in one go, it is not working.

Below is the error I got.

ValueError: operands could not be broadcast together with shapes (2919,) () (2919,5)

NOTE: The above error message is taken from my actual data set where the dimension is 2919 rows and 81 columns.

I don't know what is wrong in my code. Please help me.

1 Answer 1

1

Use boolean indexing:

m = df['Pool Area'].eq(0)

df.loc[m] = df.loc[m].fillna('No Pool')
# or
# df[m] = df[m].fillna('No Pool')

# or to limit to given columns
# cols = ['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']
# df.loc[m, cols] = df.loc[m, cols].fillna('No Pool')

updated df:

    Pool Area Pool Quality Pool Type Pool Condition Pool Finish
0         800         Good         A           Good      Gunite
1         400         Good         C           Good       Vinyl
2         485         Good         B           Good  Fibreglass
3         360         Poor         C           Poor       Vinyl
4           0      No Pool   No Pool        No Pool     No Pool
5         600         Best         A           Best      Gunite
6         500         Best         B           Best  Fibreglass
7           0      No Pool   No Pool        No Pool     No Pool
8         750         Best         A           Best      Gunite
9         900         Best         A           Best      Gunite
10          0      No Pool   No Pool        No Pool     No Pool
11        900         Best         A           Best      Gunite
12        400         Poor         C           Poor  Fibreglass
13          0      No Pool   No Pool        No Pool     No Pool
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you @mozway. I want to understand why the first method was not working. Is there anything that you can share with me?
@Manikandan it would work for one column at a time. numpy.where expects the same shape in the conditional ((14,) in your case) and the replacement (14, 4). Try with np.where(np.tile(df[['Pool Area']]==0, (1, 4)), 'No Pool', df[['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.