I have a SQL db, with two tables called users and user_activities (see below). I´m trying to get a dataframe from a query with the id_user and the number of sessions that a user makes after the second day of its registration. To identify a session, we have the activity "session" in the table user_activities.
For that, I need to combine two tables. The first one, users, provides the user_id and the registration_date:
users table:
| user_id | registration_date |
|---|---|
| 1 | 2021-01-10 04:37:14 |
| 1 | 2021-01-10 10:37:24 |
| 2 | 2021-01-10 20:37:44 |
| 3 | 2021-01-10 20:10:14 |
| 2 | 2021-01-10 10:37:04 |
The other one, user_activities, tracks all the activities that each user makes:
user_activities table
| user | activity | date |
|---|---|---|
| 1 | session | 2021-01-10 04:37:14 |
| 1 | mainPage | 2021-01-10 10:37:24 |
| 2 | session | 2021-01-10 20:37:44 |
| 3 | session | 2021-01-10 20:10:14 |
| 4 | session | 2021-01-11 00:02:04 |
| 2 | session | 2021-01-12 00:03:04 |
| 4 | session | 2021-01-13 00:31:04 |
| 5 | session | 2021-01-14 20:23:04 |
| 2 | session | 2021-01-15 10:36:52 |
| 2 | mainPage | 2021-01-15 10:37:04 |
What I am trying to get
I would like to get a df with the user_id and the number of sessions made after the second day of their registration. Only the users with more than 0 sessions would be included in that df. It would be as follows:
| user_id | n_sessions |
|---|---|
| 2 | 2 |
| 4 | 1 |
| 5 | 1 |
To get the number of sessions per user, I made before:
import mysql.connector
import pandas as pd
mydb = mysql.connector.connect(host="localhost", user="root", password="", database="users")
mycursor = mydb.cursor()
#sesiones por usuario
mycursor.execute("SELECT user_id, COUNT(*) FROM user_activities WHERE name = 'session' GROUP BY user_id;")
sessions_per_user = pd.DataFrame(mycursor, columns=['user_id','n_sessions'])
But I don´t know how to join with the registration_date condition. Does anyone know how to do it?