I am having the following data.
- songs
- play_event
In songs the data is as below:
song_id total_plays
1 2000
2 4532
3 9999
4 2343
And in play event the data is as below:
user_id song_id
102 1
103 4
102 1
102 3
104 2
102 1
For each time a song was played, there is a new entry, even is a song is played again.
With this data I want to:
Get total no. of time each user played each songs. For example, if user_id
102played, the song_id1three times, as per above data. I want to have it grouped by the user_id with total count. Something like below:user_id song_id count 102 1 3 102 3 1 103 4 1 104 2 1
I am thinking of using Pandas to do this. But I want to know if pandas is the right choice.
If its not pandas, then what should be my way forward.
If Pandas is the right choice, then:
The below code allows me to get the count either grouped by user or grouped by user_id how do we get the count grouped by user_id & song_id? See a sample code I tried below:
import pandas as pd
#Load data from csv file
data = pd.DataFrame.from_csv('play_events.csv')
# Gives how many entries per user
data['user_id'].value_counts()
# Gives how many entries per songs
data['song_id'].value_counts()