I have a data set (below), where I want to group the data by the user_id and get the count for each cluster_label for each user_id. The purpose of which is to find out how many times each user went to each cluster they visited.
Essentially, I am looking for a result that returns this information (it can be in a list, dict, or comma separated):
user_id, cluster 54, cluster 109, cluster 191, cluster 204, cluster 260, cluster 263, cluster 264, cluster 278, cluster 290
819000000000000000, 1 1 2 1 3 1 1 1 1
I've tried the following code:
data['user_id'] = data.index
result = data.groupby(['user_id','cluster_label']).count()
and
groupby = data.groupby('user_id').filter(lambda x: len(x['user_id'])>=2)
#sort user locations by time
groupsort = groupby.sort_values(by='timestamp')
f = lambda x: [list(x)]
trajs = groupsort.groupby('user_id')['cluster_label'].apply(f).reset_index()
The second code block get me closer to what I'm looking for, but I have not been able to figure out the counting portion:
790068 [[485, 256, 304, 311, 311, 311, 311, 417, 417]]
Data:
user_id,timestamp,latitude,longitude,cluster_label
822000000000000000,3/28/2017 22:31,38.7842,-77164,634
822000000000000000,3/28/2017 22:44,38.7842,-77164,634
822000000000000000,3/29/2017 8:02,38.8976805,-77387238,413
822000000000000000,3/29/2017 8:21,38.8976805,-77387238,413
822000000000000000,3/29/2017 19:58,38.8976805,-77387238,413
822000000000000000,3/29/2017 22:12,38.8976805,-77387238,413
822000000000000000,3/30/2017 9:07,38.8976805,-77387238,413
822000000000000000,3/30/2017 10:27,38.8976805,-77387238,413
822000000000000000,3/30/2017 17:17,38.8976805,-77387238,413
822000000000000000,3/30/2017 17:19,38.8976805,-77387238,413
822000000000000000,3/30/2017 17:19,38.8976805,-77387238,413
822000000000000000,3/30/2017 17:20,38.8976805,-77387238,413
822000000000000000,3/30/2017 17:22,38.8976805,-77387238,413
822000000000000000,3/30/2017 18:16,38.8976805,-77387238,413
822000000000000000,3/30/2017 18:17,38.8976805,-77387238,413
822000000000000000,3/30/2017 21:43,38.8976805,-77387238,413
822000000000000000,3/31/2017 7:04,38.8976805,-77387238,413
821000000000000000,3/9/2017 19:06,39.1328,-76.694,35
821000000000000000,3/9/2017 19:07,393426644,-76.6874899,90
821000000000000000,3/9/2017 19:07,38.93730032,-778885944,207
821000000000000000,3/9/2017 19:07,38.9071923,-77368707,327
821000000000000000,3/9/2017 19:06,38.8940974,-77276216,438
821000000000000000,3/9/2017 19:07,38.882584,-77.1124701,521
821000000000000000,3/9/2017 19:08,38.8577901,-76.8538565,565
821000000000000000,3/27/2017 21:12,38.888108,-771978416,485
820000000000000000,3/9/2017 19:09,39535541,-77.1347642,77
820000000000000000,3/9/2017 19:08,38.9847,-77.1131,143
820000000000000000,3/22/2017 14:26,38.8951,-77367,432
820000000000000000,3/24/2017 19:13,39227,-77.1864,98
820000000000000000,3/30/2017 7:39,39227,-77.1864,98
819000000000000000,3/9/2017 19:09,39942239,-76.85709,54
819000000000000000,3/9/2017 19:11,39042,-7719,109
819000000000000000,3/9/2017 19:16,38.95315,-77.447735,191
819000000000000000,3/9/2017 19:10,38.95278983,-77.44791904,191
819000000000000000,3/9/2017 19:12,38.94033497,-77.17591993,204
819000000000000000,3/9/2017 19:09,38.917866,-7723722,260
819000000000000000,3/9/2017 19:09,38.917866,-7723722,260
819000000000000000,3/9/2017 19:09,38.917866,-7723722,260
819000000000000000,3/9/2017 19:15,38.91778,-76.9769,263
819000000000000000,3/9/2017 19:12,38.916489,-77318051,264
819000000000000000,3/9/2017 19:12,38.915147,-77217751,278
819000000000000000,3/9/2017 19:15,38.912068,-77190228,290