I have a file-sharing site where my users are interested in clicks on their files. Each click is stored as a new row in the clicks table.
Usually, they want to know how many clicks they got on a certain date range:
$statement = $db->prepare("SELECT COUNT(DISTINCT ip) FROM clicks WHERE user_id=? AND time BETWEEN ? AND ?");
$statement->execute(array($user_id, $from_date, $to_date));
Additionally, they can also see the number of clicks for a certain file:
$statement = $db->prepare("SELECT COUNT(DISTINCT ip) FROM clicks WHERE file_id=? AND time BETWEEN ? AND ?");
$statement->execute(array($file_id, $from_date, $to_date));
The problem with these queries, is that user_id and file_id are not keys for this table (they are not unique). Instead, a simple 'id' column is the primary key but it never plays into any of the queries.
I have been researching clustered indexes but I cannot figure out how to implement it in this case.
As the clicks table is growing pretty large (5-6 million rows) these queries are taking longer (and I plan for this table to get a lot bigger). I read that partitioning might be what I need to do?
Do I need to make a clustered key, partition the table, or both?
For reference, the clicks structure:
id time user_id ip file_id