0

Consider the sample df below:

import pandas as pd

d = {'id': ["A123", "A123", "A123"],
     'text1': ["this is a sample", "this is a sample", "this is a sample"],
     'text2': ["sing with me", "one two three", "sing with me"]}
df = pd.DataFrame(data=d)

I'm trying to take the id column id and concat the unique values of each of the text columns, so that the sample df:

id    text1              text2
A123  this is a sample   sing with me
A123  this is a sample   one two three
A123  this is a sample   sing with me

Will look like this:

id    combined_text
A123  this is a sample | sing with me | one two three

I tried all sort of combination of " | ".join(x) and agg and more... I can take d['id','text1'].unique() and d['id','text2'].unique() and later merge, but there must be a more efficient way.

1 Answer 1

1

I'd suggest using stack() and unique() here. I also broke this up by group, using the id column.

import pandas as pd

d = {'id': ["A123", "A123", "A123"],
     'text1': ["this is a sample", "this is a sample", "this is a sample"],
     'text 2': ["sing with me", "one two three", "sing with me"]}
df = pd.DataFrame(data=d)

df = pd.DataFrame(
    df.groupby('id')
      .apply(
          lambda df: ' | '.join(df[['text1', 'text 2']].stack().unique())
      ),
columns=['combined_text'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.