1

I have data where I may have different people associated with the same entry.

I need to combine the two entries together and note that two people are on it.

For example, the data may look like:

Name Share_ID value1 value2 value3 etc.
Joe  0001     1      2      4
Ann  0002     2      5      2
Mel  0001     1      2      4

The output would need to be:

Name      Share_ID value1 value2 value3 etc.
Joe, Mel  0001     1      2      4
Ann       0002     2      5      2

I tried to use groupby

df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index()

But my result from that was just:

Share_ID Name
0001     Joe, Mel
0002     Ann

The Name column combined correctly, but I lost the other columns. Note that I do not want the other columns to have anything applied to them--Joe and Ann's records are identical.

I think my approach is off, but I'm not sure what function to use.

1
  • did you check with groupby ? Commented Oct 8, 2019 at 21:21

3 Answers 3

1

Starting where you left off you could just join your resulting data set back to the initial DataFrame:

# Find the merged name data set and rename the 'Name' column
names = df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index().rename(columns={'Name':'Merged Name'})
# Join it to the original dataset
df1 = df1.merge(names, on='Share_ID')
# Drop the 'Name' column then drop duplicates.
df1 = df1.drop(columns=['Name']).drop_duplicates()
Sign up to request clarification or add additional context in comments.

1 Comment

Wonderful. I appreciate that it's broken down into multiple lines and that the column renames happen in order to prevent having a bunch of Name_x and Name_y columns.
1

You can take the outcome you got, merge it with the original dataframe, and drop duplicates:

pd.merge(df1.groupby(['Share_ID'])['Name'].apply(', '.join).reset_index(), df1, on='Share_ID').drop_duplicates(subset='Share_ID')

2 Comments

This is a great solution; I marked the other as accepted simply because it's a little bit more readable for me as a new user (so when I review my code later, I'll be more likely to remember what was done).
@RoccoMaxamas Thanks!
0

Any particular reason for not using values fields in group by?

df1.groupby(['Share_ID','value1', 'value2', 'value3'])['Name'].apply(', '.join).reset_index()

This will give the required output.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.