Pandas. merge/join/concat. Rows into columns

Question

Given data frames similar to the following:

df1 = pd.DataFrame({'Customer': ['Customer1', 'Customer2', 'Customer3'],
             'Status': [0, 1, 1]}

Customer        Status
0   Customer1     0
1   Customer2     1
2   Customer3     1

df2 = pd.DataFrame({'Customer': ['Customer1', 'Customer1', 'Customer1', 'Customer2', 'Customer2', 'Customer3'],
             'Call': ['01-01', '01-02', '01-03', '02-01', '03-02', '06-01']})

    Customer    Call
0   Customer1   01-01
1   Customer1   01-02
2   Customer1   01-03
3   Customer2   02-01
4   Customer2   03-02
5   Customer3   06-01

What is the most efficient method for me to merge the two into a third data frame in which the rows from df2 become columns added to df1. In the new df each row should be a unique customer and 'Call' from df2 is added as incrementing columns populated by NaN values as required?

I'd like to end up with something like:

    Customer    Status  Call_1  Call_2  Call_3
0   Customer1   0       01-01   01-02   01-03
1   Customer2   1       02-01   03-02   NaN
2   Customer3   1       06-01   NaN     NaN

I assume some combination of stack() and merge() is required but can't seem to figure it out.

Help appreciated

jezrael · Accepted Answer · 2023-02-03 11:52:22Z

3

Use DataFrame.join with new DataFrame reshaped by GroupBy.cumcount and Series.unstack:

df = df1.join(df2.set_index(['Customer', df2.groupby('Customer').cumcount().add(1)])['Call']
                 .unstack().add_prefix('Call_'), 'Customer')
print (df)
    Customer  Status Call_1 Call_2 Call_3
0  Customer1       0  01-01  01-02  01-03
1  Customer2       1  02-01  03-02    NaN
2  Customer3       1  06-01    NaN    NaN

answered Feb 3, 2023 at 11:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2023-02-03 11:53:09Z

3

First pivot df2 with a cumcount de-duplication, then merge:

out = df1.merge(df2.assign(n=df2.groupby('Customer').cumcount().add(1))
                   .pivot(index='Customer', columns='n', values='Call')
                   .add_prefix('Call_'),
                left_on='Customer', right_index=True)

Output:

    Customer  Status Call_1 Call_2 Call_3
0  Customer1       0  01-01  01-02  01-03
1  Customer2       1  02-01  03-02    NaN
2  Customer3       1  06-01    NaN    NaN

answered Feb 3, 2023 at 11:53

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

Laurent B. · Accepted Answer · 2023-02-04 09:29:43Z

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Customer':['Customer1','Customer2','Customer3'],
             'Status':[0,1,1]})

df2 = pd.DataFrame({'Customer':['Customer1','Customer1','Customer1','Customer2','Customer2','Customer3'],
                    'Call': ['01-01','01-02','01-03','02-01','03-02','06-01']
                    })

group_c = df2.groupby('Customer')
step = group_c.cumcount().max() + 1
empty_df = pd.DataFrame(np.nan, index=range(step), columns=df2.columns)

r = (group_c.apply(lambda g: empty_df.combine_first(g.reset_index(drop=True)).reset_index(drop=True))
            .unstack()
            .drop('Customer', axis=1)
)

r.columns = r.columns.droplevel(0)+1
r = r.add_prefix('Call_')

Result

>>> r
    Customer  Status Call_1 Call_2 Call_3
0  Customer1       0  01-01  01-02  01-03
1  Customer2       1  02-01  03-02    NaN
2  Customer3       1  06-01    NaN    NaN

Empty_df content :

empty_df
   Customer  Call
0       NaN   NaN
1       NaN   NaN
2       NaN   NaN

Collectives™ on Stack Overflow

Pandas. merge/join/concat. Rows into columns

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related