0

I have a pandas dataframe which has more than 4 columns. Some values in the col1 are missing and I want to set those missing values based on the following approach:

  1. try to set it based on the average of values of col1 of the records that have the same col2,col3,col4 values
  2. if there is no such record, set it based on the average of values of col1 of the records that have the same col2,col3 values
  3. if there is still no such record, set it based on the average of values of col1 of the records that have the same col2 values
  4. If none of the above could be found, set it to the average of all other non-missing values in col1

What's the best way to do this?

1
  • 1
    Without some data, not easy to help. Please take a moment to read about how to post pandas questions: stackoverflow.com/questions/20109391/… Commented Jul 29, 2020 at 20:34

2 Answers 2

2

Based on your logic, you can do something as follows, where each row of fillna corresponds to a bullet point in your question, in the same order:

df['col1'] = (df['col1']
               .fillna(df.groupby(['col2','col3','col4'])['col1'].transform('mean'))
               .fillna(df.groupby(['col2','col3'])['col1'].transform('mean'))
               .fillna(df.groupby(['col2'])['col1'].transform('mean')
               .fillna(df['col1'].mean())
             )
Sign up to request clarification or add additional context in comments.

Comments

0

--- filling null null value with zero

df_with_dummies.fillna(value = 0, inplace = True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.