0

I want to merge two dataframe - Lifetime_df and Input_DataFrame2. The final Lifetime_df should contain everything which it had, but replaced with count of Input_DataFrame2 for matching columns ['Identifier_column', 'lifetime']

Lifetime_df

    Identifier_column  lifetime count
0      138122               1     1
1      138122               2     1
2      138122               3   NaN
3      138122               4   NaN
4      138122               5     0
5      138122               6     1
6      138122               7   NaN
7      138122               8     0
8      138122               9     1
Input_DataFrame2

    Identifier_column  lifetime count
0      138122               1     1
1      138122               2     4
2      138122               6     1
3      138122               9     1

Desired Output:

Lifetime_df

    Identifier_column  lifetime count
0      138122               1     1
1      138122               2     4
2      138122               3   NaN
3      138122               4   NaN
4      138122               5     0
5      138122               6     1
6      138122               7   NaN
7      138122               8     0
8      138122               9     1

The following command's output doesn't satisfy the requirement

Input_DataFrame3 = pd.merge(Lifetime_df, 
                                Input_DataFrame2, 
                                how='left', 
                                on=[Identifier_column, lifetime])

Lifetime_df['count'] = Input_DataFrame3['count_y']

Getting:

Lifetime_df

    Identifier_column  lifetime count
0      138122               1     1
1      138122               2     4
2      138122               3   NaN
3      138122               4   NaN
4      138122               5   NaN
5      138122               6     1
6      138122               7   NaN
7      138122               8   NaN
8      138122               9     1
1
  • how can your command work? I doubt on=[Identifier_column, lifetime]) are defined variaables Commented Jun 8, 2020 at 14:25

3 Answers 3

3

With the good old merge and fillna:

Input_DataFrame3  = Lifetime_df.merge(Input_DataFrame2, 
                                      on=['Identifier_column', 'lifetime'], 
                                      how='left', 
                                      suffixes=['_x', ''])

Input_DataFrame3['count'] = Input_DataFrame3['count'].fillna(Input_DataFrame3['count_x'])
Input_DataFrame3 = Input_DataFrame3.drop(columns='count_x')

   Identifier_column  lifetime  count
0             138122         1    1.0
1             138122         2    4.0
2             138122         3    NaN
3             138122         4    NaN
4             138122         5    0.0
5             138122         6    1.0
6             138122         7    NaN
7             138122         8    0.0
8             138122         9    1.0

Or inspired by YOBEN's answer, pd.concat and drop_duplicates:

key_cols = ['Identifier_column', 'lifetime']
pd.concat([Input_DataFrame2, Lifetime_df]).drop_duplicates(key_cols).sort_values(key_cols)

   Identifier_column  lifetime  count
0             138122         1    1.0
1             138122         2    4.0
2             138122         3    NaN
3             138122         4    NaN
4             138122         5    0.0
5             138122         6    1.0
6             138122         7    NaN
7             138122         8    0.0
8             138122         9    1.0
Sign up to request clarification or add additional context in comments.

Comments

3

We can try concat then groupby with first

df=pd.concat([Input_DataFrame2,Lifetime_df]).\
      groupby(['Identifier_column','lifetime'])['count'].first().reset_index()

Comments

1

Let's use combine_first:

cols = ['Identifier_column', 'lifetime']
Input_DataFrame2.set_index(cols)\
   .combine_first(Lifetime_df.set_index(cols)).reset_index()

Output:

   Identifier_column  lifetime  count
0             138122         1    1.0
1             138122         2    4.0
2             138122         3    NaN
3             138122         4    NaN
4             138122         5    0.0
5             138122         6    1.0
6             138122         7    NaN
7             138122         8    0.0
8             138122         9    1.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.