Replace na's in pandas dataframe with the mean value of two column if they don't contain na's

Question

I have a dataset like this one

data = {'DATE': ['2012-01-01', '2013-05-16', '2013-05-28',
               '2013-06-05', '2013-06-07', '2014-02-02'],
               'Avarage_Temp': [21.8, 21.1, 22.8, 23.3, 14.4, np.nan],
               'Minimun_Temp': [14.4, np.nan, np.nan, np.nan, 15.6, 18.3],
               'Maximum_Temp': [6.7, 14.4, 11.7, 16.1, np.nan, 10.0]}

When I execute the command

data['Avarage_Temp'] = data[['Minimun_Temp', 'Maximum_Temp']].mean(axis=1).round(1)

I get this output:

data = {'DATE': ['2012-01-01', '2013-05-16', '2013-05-28',
               '2013-06-05', '2013-06-07', '2014-02-02'],
               'Avarage_Temp': [10.6, 14.4, 11.7, 16.1, 15.6, 14.1],
               'Minimun_Temp': [14.4, np.nan, np.nan, np.nan, 15.6, 18.3],
               'Maximum_Temp': [6.7, 14.4, 11.7, 16.1, np.nan, 10.0]}

As you can see in rows 2-5 the result is wrong, because in these rows it return the value of the non missing value(if Minimun_Temp is na, gives the value of Maximum_Temp in the Avarage_Temp). So I want to have as output something like the 1st and 6th row. I want to change the value of Avarage_Temp only if the column Maximum_Temp and Minimun_Temp in this row aren't na's. If Maximum_Temp or Minimun_Temp is na, I want to keep the value of Avarage_Temp as it is.

jezrael · Accepted Answer · 2022-02-07 12:56:03Z

You can create NaNs if at least one value in Minimun_Temp, Maximum_Temp by parameter skipna=False in DataFrame.mean and then replace them by column Avarage_Temp:

df = pd.DataFrame(data)

df['Avarage_Temp'] = (df[['Minimun_Temp', 'Maximum_Temp']].mean(axis=1, skipna=False)
                                                          .round(1)
                                                          .fillna(df['Avarage_Temp']))
print (df)
         DATE  Avarage_Temp  Minimun_Temp  Maximum_Temp
0  2012-01-01          10.6          14.4           6.7
1  2013-05-16          21.1           NaN          14.4
2  2013-05-28          22.8           NaN          11.7
3  2013-06-05          23.3           NaN          16.1
4  2013-06-07          14.4          15.6           NaN
5  2014-02-02          14.2          18.3          10.0

Another idea with Series.add for NaNs if at least one NaN per rows and for mean divide by 2:

df['Avarage_Temp'] = (df['Minimun_Temp'].add(df['Maximum_Temp'])
                                        .div(2)
                                        .round(1)
                                        .fillna(df['Avarage_Temp']))
print (df)
         DATE  Avarage_Temp  Minimun_Temp  Maximum_Temp
0  2012-01-01          10.6          14.4           6.7
1  2013-05-16          21.1           NaN          14.4
2  2013-05-28          22.8           NaN          11.7
3  2013-06-05          23.3           NaN          16.1
4  2013-06-07          14.4          15.6           NaN
5  2014-02-02          14.2          18.3          10.0

Collectives™ on Stack Overflow

Replace na's in pandas dataframe with the mean value of two column if they don't contain na's

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related