How to join all unique strings from multiple columns of each group in pandas

Question

Consider the sample df below:

import pandas as pd

d = {'id': ["A123", "A123", "A123"],
     'text1': ["this is a sample", "this is a sample", "this is a sample"],
     'text2': ["sing with me", "one two three", "sing with me"]}
df = pd.DataFrame(data=d)

I'm trying to take the id column id and concat the unique values of each of the text columns, so that the sample df:

id    text1              text2
A123  this is a sample   sing with me
A123  this is a sample   one two three
A123  this is a sample   sing with me

Will look like this:

id    combined_text
A123  this is a sample | sing with me | one two three

I tried all sort of combination of " | ".join(x) and agg and more... I can take d['id','text1'].unique() and d['id','text2'].unique() and later merge, but there must be a more efficient way.

Nick ODell · Accepted Answer · 2021-12-04 20:20:04Z

1

I'd suggest using stack() and unique() here. I also broke this up by group, using the id column.

import pandas as pd

d = {'id': ["A123", "A123", "A123"],
     'text1': ["this is a sample", "this is a sample", "this is a sample"],
     'text 2': ["sing with me", "one two three", "sing with me"]}
df = pd.DataFrame(data=d)

df = pd.DataFrame(
    df.groupby('id')
      .apply(
          lambda df: ' | '.join(df[['text1', 'text 2']].stack().unique())
      ),
columns=['combined_text'])

answered Dec 4, 2021 at 20:20

Nick ODell

28.1k7 gold badges52 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to join all unique strings from multiple columns of each group in pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related