Pandas combine and merge rows

Question

My original data has some column data that I'm expanding to their own columns. Here's how it starts:

Order ID    Items    Order Line item Properties 1 Title    Order Line item Properties 1 Value
--------    -----    ----------------------------------    ----------------------------------
1           x        Org ID                                1234
2           x        Org ID                                5678
2           x        Ship From                             DEN
2           y        Ship To                               CLE
2           y        Org ID                                5678
2           y        Ship From                             DEN
2           y        Ship To                               CLE

I have some code that creates columns for Org ID, Ship From, and Ship To. The resulting data looks like this:

Order ID    Items    Org ID    Ship From    Ship To
--------    -----    ------    ---------    --------
1           x        1234      None         None
2           x        5678      None         None
2           x        5678      DEN          None
2           x        5678      None         CLE
2           y        5678      None         None
2           y        5678      DEN          None
2           y        5678      None         CLE

I'm trying to get the data to look like this:

Order ID    Items    Org ID    Ship From    Ship To
--------    -----    ------    ---------    --------
1           x        1234      None         None
2           x, y     5678      DEN          CLE

I think I have a grasp on everything except concatenating the items to show up as x, y when the rest of the data is the same.

Here is some of the code that gets me almost all of the way there:

df.groupby('Order ID').apply(lambda x: x.ffill().bfill()).drop_duplicates()

I can get to the string I want with this: [str(x) for x in df['Items']], but I'm not sure how to get that into the items field for the resulting row(s).

What can I do to merge, concatenate, squash, join, or whatever the correct word is to end up with x, y for items on order 2?

Thanks!

Arun Augustine · Accepted Answer · 2020-01-14 02:33:17Z

1

Try something like this

df.groupby(['Order ID','Org ID'])['Items'].apply(lambda x: ','.join(set(x.astype(str)))).reset_index()

Output

      Order ID  Org ID  Items
0         1     1234      x
1         2     5678    y,x

edited Jan 14, 2020 at 2:33

answered Jan 14, 2020 at 2:13

Arun Augustine

1,7761 gold badge13 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hookedonwinter Over a year ago

This is definitely on the right track. How do I get this to also include the other columns? Ship From, Ship To, etc?

haru · Accepted Answer · 2020-01-14 03:04:06Z

1

To answer your comment question on Arun's answer, you can add more columns by adding the column titles in the groupby list:

df.groupby(['Order ID','Org ID', 'Ship To', 'Ship From'])['Items'].apply(lambda x: ','.join(set(x.astype(str)))).reset_index()

answered Jan 14, 2020 at 3:04

haru

112 bronze badges

2 Comments

hookedonwinter Over a year ago

Thanks! What if I have a lot of columns? Do I need to add them all? Or I guess I could do something like [col for col in df.columns]

haru Over a year ago

You only need to add the columns that you need. Or, you can do a pd.merge on the Order ID to join data sets if you need any other columns.

Collectives™ on Stack Overflow

Pandas combine and merge rows

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related