Is there a better way to do what the code below does in a (slow!) loop?
Using an input DataFrame, I want to convert it to a list of products each user has consumed. But this list will be up to the millions long and this seems quite inefficient (unless I use cython). Any ideas how to make this more python-happy? Thanks!
a = pd.DataFrame({'user_id':['a', 'a', 'b', 'c', 'c', 'c'], 'prod_id':['p1', 'p2', 'p1', 'p2', 'p3', 'p7']})
print "Input Dataframe:\n", a
print '\nDesired Output:'
# Build desired output:
uniqIDs = a.user_id.unique()
for id in uniqIDs:
prod_list = list(a[a.user_id == id].prod_id.values)
s = id + '\t'
for x in prod_list:
s += x + '\t'
print s # This will get saved to a TAB DELIMITED file
Gives this output (which is exactly what I desire):
Input Dataframe:
prod_id user_id
0 p1 a
1 p2 a
2 p1 b
3 p2 c
4 p3 c
5 p7 c
Desired Output:
a p1 p2
b p1
c p2 p3 p7