1

I have a timeseries dataset which looks a bit like

dt         userid var1 var2
2013-04-27 100001   a   50
2013-04-27 100002   a   45
2013-04-28 100001   b   10
2013-04-29 100001   c   20
2013-04-29 100002   b   15

I'd like to extract the event sequence for each user - into a new DataFrame similar to

userid activity_seq
100001    [a, b, c]
100002       [a, b]

the order of the var1 in the activity_seq column follows the dt column (i.e. in chronological order). Could someone please suggest how this can be achieved using pandas?

Thanks in advance.


Here's a snippet to recreate the original DataFrame,

import pandas as pd

l1 = ['2013-04-27', '2013-04-27', '2013-04-28', '2013-04-29', '2013-04-29']
l2 = [100001, 100002, 100001, 100001, 100002]
l3 = ['a','a','b','c','b']
l4 = [50, 45, 10, 20, 15]
df = pd.DataFrame({'dt':l1, 'userid':l2, 'var1':l3, 'var2':l4})

1 Answer 1

2

You can groupby on 'userid' and then on 'var1' col call apply and pass list to create a sequence, you can rename/reset if required.

In [58]:
df.groupby('userid')['var1'].apply(list)

Out[58]:
userid
100001    [a, b, c]
100002       [a, b]
Name: var1, dtype: object

To get desired df, call reset_index and rename:

In [62]:
df.groupby('userid')['var1'].apply(list).reset_index().rename(columns={'var1':'activity_seq'})

Out[62]:
   userid activity_seq
0  100001    [a, b, c]
1  100002       [a, b]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.