Pandas - Merge rows of dataframe that have a shared value

Question

I have a dataframe with a list of items in the first row and then all the items that were bought with that item in subsequent columns:

df = pd.DataFrame({'1': ['Item 1', 'Item 1', 'Item 1', 'Item 2', 'Item 2', 'Item 2'],  
                    '2': ['Item 4', 'Item 5', 'Item 6', 'Item 7', 'Item 8', 'Item 9'],
                   '3': ['Item 10', 'Item 11', 'Item 12', 'Item 13', 'Item 14', 'Item 15']})

I want to merge all the items bought with each item into a single row as below:

new_df = pd.DataFrame({'1': ['Item 1', 'Item 2'],
                   '2': ['Item 4', 'Item 7'],
                   '3': ['Item 10', 'Item 13'],
                   '4': ['Item 5', 'Item 8'],
                   '5': ['Item 11', 'Item 14'],
                   '6': ['Item 6', 'Item 9'],
                   '7': ['Item 12', 'Item 15']})

So, all the items bought with Item 1 form the columns next to it. As you can see in my example I want to keep all items that were bought with each item, even if they are duplicated.

I have been trying to get it to work with a pandas dataframe, however if there was a list generated for each item that would also be fine. I have been trying some kind of groupby and lambda function but I can't get them to work.

EDIT: Changed numbers to make it more clear how the final df should be organized.

Thanks!

@HenryEcker apologies, pretty new to Stack, have done that now — user13593456
– user13593456, Commented Jun 2, 2021 at 23:54

Nk03 · Accepted Answer · 2021-06-03 04:33:32Z

1

TRY:

new_df = df.groupby('1', as_index=False).apply(
    lambda x: pd.Series(x.values.ravel()[1:]))

OUTPUT:

        1       0        1       2       3        4       5       6        7
0  Item 1  Item 4  Item 10  Item 1  Item 5  Item 11  Item 1  Item 6  Item 12
1  Item 2  Item 7  Item 13  Item 2  Item 8  Item 14  Item 2  Item 9  Item 15

answered Jun 3, 2021 at 4:33

Nk03

15k2 gold badges11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Henry Ecker · Accepted Answer · 2021-06-03 10:27:13Z

0

A pivot_table option:

new_df = (
    df.pivot_table(index='1',
                   columns=df.groupby('1').cumcount(),
                   aggfunc='first')
        .sort_index(axis=1, level=1)
        .reset_index()
)
new_df.columns = np.arange(1, new_df.shape[1] + 1).astype(str)

new_df:

        1       2        3       4        5       6        7
0  Item 1  Item 4  Item 10  Item 5  Item 11  Item 6  Item 12
1  Item 2  Item 7  Item 13  Item 8  Item 14  Item 9  Item 15

answered Jun 3, 2021 at 10:27

Henry Ecker♦

35.9k19 gold badges48 silver badges67 bronze badges

Collectives™ on Stack Overflow

Pandas - Merge rows of dataframe that have a shared value

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related