1

Given data frames similar to the following:

df1 = pd.DataFrame({'Customer': ['Customer1', 'Customer2', 'Customer3'],
             'Status': [0, 1, 1]}

Customer        Status
0   Customer1     0
1   Customer2     1
2   Customer3     1

df2 = pd.DataFrame({'Customer': ['Customer1', 'Customer1', 'Customer1', 'Customer2', 'Customer2', 'Customer3'],
             'Call': ['01-01', '01-02', '01-03', '02-01', '03-02', '06-01']})

    Customer    Call
0   Customer1   01-01
1   Customer1   01-02
2   Customer1   01-03
3   Customer2   02-01
4   Customer2   03-02
5   Customer3   06-01

What is the most efficient method for me to merge the two into a third data frame in which the rows from df2 become columns added to df1. In the new df each row should be a unique customer and 'Call' from df2 is added as incrementing columns populated by NaN values as required?

I'd like to end up with something like:

    Customer    Status  Call_1  Call_2  Call_3
0   Customer1   0       01-01   01-02   01-03
1   Customer2   1       02-01   03-02   NaN
2   Customer3   1       06-01   NaN     NaN

I assume some combination of stack() and merge() is required but can't seem to figure it out.

Help appreciated

3 Answers 3

3

Use DataFrame.join with new DataFrame reshaped by GroupBy.cumcount and Series.unstack:

df = df1.join(df2.set_index(['Customer', df2.groupby('Customer').cumcount().add(1)])['Call']
                 .unstack().add_prefix('Call_'), 'Customer')
print (df)
    Customer  Status Call_1 Call_2 Call_3
0  Customer1       0  01-01  01-02  01-03
1  Customer2       1  02-01  03-02    NaN
2  Customer3       1  06-01    NaN    NaN
Sign up to request clarification or add additional context in comments.

Comments

3

First pivot df2 with a cumcount de-duplication, then merge:

out = df1.merge(df2.assign(n=df2.groupby('Customer').cumcount().add(1))
                   .pivot(index='Customer', columns='n', values='Call')
                   .add_prefix('Call_'),
                left_on='Customer', right_index=True)

Output:

    Customer  Status Call_1 Call_2 Call_3
0  Customer1       0  01-01  01-02  01-03
1  Customer2       1  02-01  03-02    NaN
2  Customer3       1  06-01    NaN    NaN

Comments

0
import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Customer':['Customer1','Customer2','Customer3'],
             'Status':[0,1,1]})

df2 = pd.DataFrame({'Customer':['Customer1','Customer1','Customer1','Customer2','Customer2','Customer3'],
                    'Call': ['01-01','01-02','01-03','02-01','03-02','06-01']
                    })

group_c = df2.groupby('Customer')
step = group_c.cumcount().max() + 1
empty_df = pd.DataFrame(np.nan, index=range(step), columns=df2.columns)

r = (group_c.apply(lambda g: empty_df.combine_first(g.reset_index(drop=True)).reset_index(drop=True))
            .unstack()
            .drop('Customer', axis=1)
)

r.columns = r.columns.droplevel(0)+1
r = r.add_prefix('Call_')

Result

>>> r
    Customer  Status Call_1 Call_2 Call_3
0  Customer1       0  01-01  01-02  01-03
1  Customer2       1  02-01  03-02    NaN
2  Customer3       1  06-01    NaN    NaN

Empty_df content :

empty_df
   Customer  Call
0       NaN   NaN
1       NaN   NaN
2       NaN   NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.