0

The question is very similar to Updating column in a dataframe based on multiple columns

I have the following dataset:

CustomerID  TypeofContact   Occupation  Gender  MaritalStatus
0   200000  Self Enquiry    Salaried    Female  Single
1   200001  Company Invited Salaried    Male    Divorced
2   200002  Self Enquiry    Free Lancer Male    Single
3   200003  Company Invited Salaried    Female  Divorced
4   200004  Self Enquiry    Small Business  Male    Divorced
5   200005  Company Invited Salaried    Male    Single
6   200006  Self Enquiry    Small Business  Female  Divorced
7   200007  Self Enquiry    Salaried    Male    Married
8   200008  NaN             Salaried    Male    Single
9   200009  Self Enquiry    Small Business  Male    Divorced
10  200010  Self Enquiry    Small Business  Male    Divorced
11  200011  NaN             Salaried    Female  Single
12  200012  Self Enquiry    Small Business  Male    Married
13  200013  NaN             Small Business  Male    Married
14  200014  Self Enquiry    Salaried    Male    Single

I am looking to update the NaN in TypeofContact column to the first record that satisfies the the condition - combination of not null Occupation, Gender & Marital Status with the null record's combination of Occupation, Gender & Marital Status

Example:

CustomerID with 200014 can satisfy the CustomerID 200008 that has TypeofContact as NaN as both sets have the same Occupation, Gender & Marital Status

Same is the case with 200013 and 200012

CustomerID  TypeofContact   Occupation  Gender  MaritalStatus
0   200000  Self Enquiry    Salaried    Female  Single
1   200001  Company Invited Salaried    Male    Divorced
2   200002  Self Enquiry    Free Lancer Male    Single
3   200003  Company Invited Salaried    Female  Divorced
4   200004  Self Enquiry    Small Business  Male    Divorced
5   200005  Company Invited Salaried    Male    Single
6   200006  Self Enquiry    Small Business  Female  Divorced
7   200007  Self Enquiry    Salaried    Male    Married
8   200008  Self Enquiry    Salaried    Male    Single
9   200009  Self Enquiry    Small Business  Male    Divorced
10  200010  Self Enquiry    Small Business  Male    Divorced
11  200011  Self Enquiry    Salaried    Female  Single
12  200012  Self Enquiry    Small Business  Male    Married
13  200013  Self Enquiry    Small Business  Male    Married
14  200014  Self Enquiry    Salaried    Male    Single

I was able to create another dataframe with not null's, loop through it and update the original dataframe using CustomerID identifier.

What would be an efficient way to accomplish this?

Thanks.

5
  • What would the output look like? (It's very helpful to have expected output to compare against) And also maybe some more clarification on what this condition means "the first TypeofContact record that satisfies the condition - combination of not null Occupation, Gender & Marital Status" Commented Jun 9, 2021 at 1:42
  • Updated the original question and also added how the output should look like Commented Jun 9, 2021 at 2:06
  • Is that just the previous not null value in TypeofConcact where do the "combination of not null Occupation, Gender & Marital Status" come into play? Commented Jun 9, 2021 at 2:08
  • Yes, you are right. Commented Jun 9, 2021 at 5:09
  • I tried the following, it does not update the Column data['TypeofContact'] = data['TypeofContact'].fillna(data.groupby(['Occupation','Gender','MaritalStatus'])['TypeofContact'].head(1)) Commented Jun 9, 2021 at 5:51

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.