0

My original data has some column data that I'm expanding to their own columns. Here's how it starts:

Order ID    Items    Order Line item Properties 1 Title    Order Line item Properties 1 Value
--------    -----    ----------------------------------    ----------------------------------
1           x        Org ID                                1234
2           x        Org ID                                5678
2           x        Ship From                             DEN
2           y        Ship To                               CLE
2           y        Org ID                                5678
2           y        Ship From                             DEN
2           y        Ship To                               CLE

I have some code that creates columns for Org ID, Ship From, and Ship To. The resulting data looks like this:

Order ID    Items    Org ID    Ship From    Ship To
--------    -----    ------    ---------    --------
1           x        1234      None         None
2           x        5678      None         None
2           x        5678      DEN          None
2           x        5678      None         CLE
2           y        5678      None         None
2           y        5678      DEN          None
2           y        5678      None         CLE

I'm trying to get the data to look like this:

Order ID    Items    Org ID    Ship From    Ship To
--------    -----    ------    ---------    --------
1           x        1234      None         None
2           x, y     5678      DEN          CLE

I think I have a grasp on everything except concatenating the items to show up as x, y when the rest of the data is the same.

Here is some of the code that gets me almost all of the way there:

df.groupby('Order ID').apply(lambda x: x.ffill().bfill()).drop_duplicates()

I can get to the string I want with this: [str(x) for x in df['Items']], but I'm not sure how to get that into the items field for the resulting row(s).

What can I do to merge, concatenate, squash, join, or whatever the correct word is to end up with x, y for items on order 2?

Thanks!

2 Answers 2

1

Try something like this

df.groupby(['Order ID','Org ID'])['Items'].apply(lambda x: ','.join(set(x.astype(str)))).reset_index()

Output

      Order ID  Org ID  Items
0         1     1234      x
1         2     5678    y,x

Sign up to request clarification or add additional context in comments.

1 Comment

This is definitely on the right track. How do I get this to also include the other columns? Ship From, Ship To, etc?
1

To answer your comment question on Arun's answer, you can add more columns by adding the column titles in the groupby list:

df.groupby(['Order ID','Org ID', 'Ship To', 'Ship From'])['Items'].apply(lambda x: ','.join(set(x.astype(str)))).reset_index()

2 Comments

Thanks! What if I have a lot of columns? Do I need to add them all? Or I guess I could do something like [col for col in df.columns]
You only need to add the columns that you need. Or, you can do a pd.merge on the Order ID to join data sets if you need any other columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.