3

I have a dataframe that contains the full chat between the user and customer agent. I would like to extract just the messages from the user and create new rows from them with the same ticket id:

ticket_id = pd.DataFrame(["1","2"]).rename(columns={0:"Ticket-ID"})
full_chat = pd.DataFrame([
   "User foo foo foo 12:12 PM, Agent bar bar bar 12:12 PM, User foo foo 12:13 
    PM, Agent bar bar 12:13 PM, User foo 12:14 PM, Agent bar 12:14 PM", 

   "User bar bar bar 12:12 PM, Agent foo foo foo 12:12 PM, User bar bar 12:13 
    PM"
    ]).rename(columns={0:"Full-Chat"})


merge_chat = pd.merge(ticket_id, full_chat, left_index=True, right_index=True, how='outer')


def _split_row(text):
    cleaned_text = text.lower()

    lines = re.findall(r"\b\w*user\b\ (.*?)\ *\d\d:\d\d*", cleaned_text)        

    for line in lines:
        print(line.split())

print(merge_chat["Full-Chat"].apply(_split_row))

I would like it to be like:

Ticket-ID      Full-Chat
1              foo foo foo
1              foo foo
1              foo
2              bar bar bar
2              bar bar
0

2 Answers 2

1

IIUC,

merge_chat['Full-Chat'] = merge_chat['Full-Chat'].apply(lambda i: re.findall(r"\b\w*user\b\ (.*?)\ *\d\d:\d\d*", i.lower()))

From Pandas 0.25.0 onwards,

merge_chat.explode(column='Full-Chat')

would give you the result

In versions prior to 0.25.0,

df = pd.DataFrame(merge_chat['Full-Chat'].tolist(), index=merge_chat['Ticket-ID']).stack()
df = df.reset_index([0, 'Ticket-ID'])
df.rename(columns={0:'Full-Chat'}, inplace=True)
df
  Ticket-ID Full-Chat
0   1   foo foo foo
1   1   foo foo
2   1   foo
3   2   bar bar bar
4   2   bar bar
Sign up to request clarification or add additional context in comments.

Comments

0

I tested this and it works

ticket_id = pd.DataFrame(["1","2"]).rename(columns={0:"Ticket-ID"})
full_chat = pd.DataFrame(["User foo foo foo 12:12 PM, Agent bar bar bar 12:12 PM, User foo foo 12:13 PM, Agent bar bar 12:13 PM, User foo 12:14 PM, Agent bar 12:14 PM", "User bar bar bar 12:12 PM, Agent foo foo foo 12:12 PM, User bar bar 12:13 PM"]).rename(columns={0:"Full-Chat"})

merge_chat = pd.merge(ticket_id, full_chat, left_index=True, right_index=True, how='outer')

Output_df = pd.DataFrame(columns = ["Ticket-ID","Full-Chat"])

def split_row(text,ticket_id):
    cleaned_text = text.lower()
    lines = re.findall(r"\b\w*user\b\ (.*?)\ *\d\d:\d\d*", cleaned_text)
    return_df = pd.DataFrame(columns = ["Ticket-ID","Full-Chat"])
    for line in lines:
        New_row = pd.DataFrame({'Ticket-ID':[ticket_id],'Full-Chat':[line]})
        return_df = return_df.append(New_row)
    return return_df

for index, row in merge_chat.iterrows():
    Output_df = Output_df.append(split_row(row['Full-Chat'],row['Ticket-ID']))

Output_df=Output_df[['Ticket-ID', 'Full-Chat']].reset_index(drop=True)
Output_df.head()

Output:

 Ticket-ID Full-Chat
0 1 foo foo foo 
1 1 foo foo 
2 1 foo 
3 2 bar bar bar 
4 2 bar bar 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.