Updating NaN column (with string) based on multiple columns

Ask Question

Asked 4 years, 5 months ago

Modified 4 years, 5 months ago

Viewed 57 times

The question is very similar to Updating column in a dataframe based on multiple columns

I have the following dataset:

CustomerID  TypeofContact   Occupation  Gender  MaritalStatus
0   200000  Self Enquiry    Salaried    Female  Single
1   200001  Company Invited Salaried    Male    Divorced
2   200002  Self Enquiry    Free Lancer Male    Single
3   200003  Company Invited Salaried    Female  Divorced
4   200004  Self Enquiry    Small Business  Male    Divorced
5   200005  Company Invited Salaried    Male    Single
6   200006  Self Enquiry    Small Business  Female  Divorced
7   200007  Self Enquiry    Salaried    Male    Married
8   200008  NaN             Salaried    Male    Single
9   200009  Self Enquiry    Small Business  Male    Divorced
10  200010  Self Enquiry    Small Business  Male    Divorced
11  200011  NaN             Salaried    Female  Single
12  200012  Self Enquiry    Small Business  Male    Married
13  200013  NaN             Small Business  Male    Married
14  200014  Self Enquiry    Salaried    Male    Single

I am looking to update the NaN in TypeofContact column to the first record that satisfies the the condition - combination of not null Occupation, Gender & Marital Status with the null record's combination of Occupation, Gender & Marital Status

Example:

CustomerID with 200014 can satisfy the CustomerID 200008 that has TypeofContact as NaN as both sets have the same Occupation, Gender & Marital Status

Same is the case with 200013 and 200012

CustomerID  TypeofContact   Occupation  Gender  MaritalStatus
0   200000  Self Enquiry    Salaried    Female  Single
1   200001  Company Invited Salaried    Male    Divorced
2   200002  Self Enquiry    Free Lancer Male    Single
3   200003  Company Invited Salaried    Female  Divorced
4   200004  Self Enquiry    Small Business  Male    Divorced
5   200005  Company Invited Salaried    Male    Single
6   200006  Self Enquiry    Small Business  Female  Divorced
7   200007  Self Enquiry    Salaried    Male    Married
8   200008  Self Enquiry    Salaried    Male    Single
9   200009  Self Enquiry    Small Business  Male    Divorced
10  200010  Self Enquiry    Small Business  Male    Divorced
11  200011  Self Enquiry    Salaried    Female  Single
12  200012  Self Enquiry    Small Business  Male    Married
13  200013  Self Enquiry    Small Business  Male    Married
14  200014  Self Enquiry    Salaried    Male    Single

I was able to create another dataframe with not null's, loop through it and update the original dataframe using CustomerID identifier.

What would be an efficient way to accomplish this?

Thanks.

edited Jun 9, 2021 at 2:05

asked Jun 9, 2021 at 1:29

kkulkarn

3451 gold badge3 silver badges14 bronze badges

What would the output look like? (It's very helpful to have expected output to compare against) And also maybe some more clarification on what this condition means "the first TypeofContact record that satisfies the condition - combination of not null Occupation, Gender & Marital Status"

Henry Ecker
– Henry Ecker ♦

2021-06-09 01:42:15 +00:00
Commented Jun 9, 2021 at 1:42
Updated the original question and also added how the output should look like

kkulkarn
– kkulkarn

2021-06-09 02:06:14 +00:00
Commented Jun 9, 2021 at 2:06
Is that just the previous not null value in TypeofConcact where do the "combination of not null Occupation, Gender & Marital Status" come into play?

Henry Ecker
– Henry Ecker ♦

2021-06-09 02:08:06 +00:00
Commented Jun 9, 2021 at 2:08
Yes, you are right.

kkulkarn
– kkulkarn

2021-06-09 05:09:21 +00:00
Commented Jun 9, 2021 at 5:09
I tried the following, it does not update the Column data['TypeofContact'] = data['TypeofContact'].fillna(data.groupby(['Occupation','Gender','MaritalStatus'])['TypeofContact'].head(1))

kkulkarn
– kkulkarn

2021-06-09 05:51:45 +00:00
Commented Jun 9, 2021 at 5:51

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Updating NaN column (with string) based on multiple columns

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked