Combining text values in a pandas dataframe column based on same value in another column

Question

I have data where I may have different people associated with the same entry.

I need to combine the two entries together and note that two people are on it.

For example, the data may look like:

Name Share_ID value1 value2 value3 etc.
Joe  0001     1      2      4
Ann  0002     2      5      2
Mel  0001     1      2      4

The output would need to be:

Name      Share_ID value1 value2 value3 etc.
Joe, Mel  0001     1      2      4
Ann       0002     2      5      2

I tried to use groupby

df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index()

But my result from that was just:

Share_ID Name
0001     Joe, Mel
0002     Ann

The Name column combined correctly, but I lost the other columns. Note that I do not want the other columns to have anything applied to them--Joe and Ann's records are identical.

I think my approach is off, but I'm not sure what function to use.

did you check with groupby ?

BENY
– BENY

2019-10-08 21:21:38 +00:00
Commented Oct 8, 2019 at 21:21 — BENY
– BENY, Commented Oct 8, 2019 at 21:21

it's-yer-boy-chet · Accepted Answer · 2019-10-08 21:25:30Z

1

Starting where you left off you could just join your resulting data set back to the initial DataFrame:

# Find the merged name data set and rename the 'Name' column
names = df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index().rename(columns={'Name':'Merged Name'})
# Join it to the original dataset
df1 = df1.merge(names, on='Share_ID')
# Drop the 'Name' column then drop duplicates.
df1 = df1.drop(columns=['Name']).drop_duplicates()

answered Oct 8, 2019 at 21:25

it's-yer-boy-chet

2,0462 gold badges14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

RoccoMaxamas Over a year ago

Wonderful. I appreciate that it's broken down into multiple lines and that the column renames happen in order to prevent having a bunch of Name_x and Name_y columns.

Ami Tavory · Accepted Answer · 2019-10-08 21:26:52Z

1

You can take the outcome you got, merge it with the original dataframe, and drop duplicates:

pd.merge(df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index(), df1, on='Share_ID').drop_duplicates(subset='Share_ID')

answered Oct 8, 2019 at 21:26

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

2 Comments

RoccoMaxamas Over a year ago

This is a great solution; I marked the other as accepted simply because it's a little bit more readable for me as a new user (so when I review my code later, I'll be more likely to remember what was done).

Ami Tavory Over a year ago

@RoccoMaxamas Thanks!

mukulgarg94 · Accepted Answer · 2019-10-08 21:43:43Z

0

Any particular reason for not using values fields in group by?

df1.groupby(['Share_ID','value1', 'value2', 'value3'])['Name'].apply(', '.join).reset_index()

This will give the required output.

answered Oct 8, 2019 at 21:43

mukulgarg94

516 bronze badges

Collectives™ on Stack Overflow

Combining text values in a pandas dataframe column based on same value in another column

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related