0

I have a dataset like this one

data = {'DATE': ['2012-01-01', '2013-05-16', '2013-05-28',
               '2013-06-05', '2013-06-07', '2014-02-02'],
               'Avarage_Temp': [21.8, 21.1, 22.8, 23.3, 14.4, np.nan],
               'Minimun_Temp': [14.4, np.nan, np.nan, np.nan, 15.6, 18.3],
               'Maximum_Temp': [6.7, 14.4, 11.7, 16.1, np.nan, 10.0]}

When I execute the command

data['Avarage_Temp'] = data[['Minimun_Temp', 'Maximum_Temp']].mean(axis=1).round(1)

I get this output:

data = {'DATE': ['2012-01-01', '2013-05-16', '2013-05-28',
               '2013-06-05', '2013-06-07', '2014-02-02'],
               'Avarage_Temp': [10.6, 14.4, 11.7, 16.1, 15.6, 14.1],
               'Minimun_Temp': [14.4, np.nan, np.nan, np.nan, 15.6, 18.3],
               'Maximum_Temp': [6.7, 14.4, 11.7, 16.1, np.nan, 10.0]}

As you can see in rows 2-5 the result is wrong, because in these rows it return the value of the non missing value(if Minimun_Temp is na, gives the value of Maximum_Temp in the Avarage_Temp). So I want to have as output something like the 1st and 6th row. I want to change the value of Avarage_Temp only if the column Maximum_Temp and Minimun_Temp in this row aren't na's. If Maximum_Temp or Minimun_Temp is na, I want to keep the value of Avarage_Temp as it is.

1 Answer 1

1

You can create NaNs if at least one value in Minimun_Temp, Maximum_Temp by parameter skipna=False in DataFrame.mean and then replace them by column Avarage_Temp:

df = pd.DataFrame(data)

df['Avarage_Temp'] = (df[['Minimun_Temp', 'Maximum_Temp']].mean(axis=1, skipna=False)
                                                          .round(1)
                                                          .fillna(df['Avarage_Temp']))
print (df)
         DATE  Avarage_Temp  Minimun_Temp  Maximum_Temp
0  2012-01-01          10.6          14.4           6.7
1  2013-05-16          21.1           NaN          14.4
2  2013-05-28          22.8           NaN          11.7
3  2013-06-05          23.3           NaN          16.1
4  2013-06-07          14.4          15.6           NaN
5  2014-02-02          14.2          18.3          10.0

Another idea with Series.add for NaNs if at least one NaN per rows and for mean divide by 2:

df['Avarage_Temp'] = (df['Minimun_Temp'].add(df['Maximum_Temp'])
                                        .div(2)
                                        .round(1)
                                        .fillna(df['Avarage_Temp']))
print (df)
         DATE  Avarage_Temp  Minimun_Temp  Maximum_Temp
0  2012-01-01          10.6          14.4           6.7
1  2013-05-16          21.1           NaN          14.4
2  2013-05-28          22.8           NaN          11.7
3  2013-06-05          23.3           NaN          16.1
4  2013-06-07          14.4          15.6           NaN
5  2014-02-02          14.2          18.3          10.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.