Clustered Index Or Partition Table?

Question

I have a file-sharing site where my users are interested in clicks on their files. Each click is stored as a new row in the clicks table.

Usually, they want to know how many clicks they got on a certain date range:

$statement = $db->prepare("SELECT COUNT(DISTINCT ip) FROM clicks WHERE user_id=? AND time BETWEEN ? AND ?");
$statement->execute(array($user_id, $from_date, $to_date));

Additionally, they can also see the number of clicks for a certain file:

$statement = $db->prepare("SELECT COUNT(DISTINCT ip) FROM clicks WHERE file_id=? AND time BETWEEN ? AND ?");
$statement->execute(array($file_id, $from_date, $to_date));

The problem with these queries, is that user_id and file_id are not keys for this table (they are not unique). Instead, a simple 'id' column is the primary key but it never plays into any of the queries.

I have been researching clustered indexes but I cannot figure out how to implement it in this case.

As the clicks table is growing pretty large (5-6 million rows) these queries are taking longer (and I plan for this table to get a lot bigger). I read that partitioning might be what I need to do?

Do I need to make a clustered key, partition the table, or both?

For reference, the clicks structure:

id time user_id ip file_id

MySQL does not distinguish between a "key" and an "index". The terms are synonymous in the MySQL world. When you say that "user_id and file_id are not keys for this table (they are not unique)", do you also mean that they are not indexed? You can index non-unique columns (and in this case, definitely should). — eggyal
– eggyal, Commented Dec 13, 2012 at 2:41
Thanks, did not know that those meant the same thing. Which columns should I create an Index on? — kmoney12
– kmoney12, Commented Dec 13, 2012 at 2:45

Bill Karwin · Accepted Answer · 2012-12-13 02:55:34Z

3

You don't need to change the clustered index.

I would suggest creating these indexes:

ALTER TABLE clicks ADD INDEX (file_id, time, ip),
                   ADD INDEX (user_id, time, ip);

By including the ip in the index definition, each queries should be able to get all the information needed from the index structure itself. This is called a covering index. Then the query won't need to touch the table at all, so it doesn't matter which columns comprise the clustered index for the table.

If you use EXPLAIN to analyze the query, you should see "Using index" in the Extra field, and this indicates the query is getting the benefit of a covering index.

I don't think partitioning will help in this case, because MySQL partitioning requires that the partition columns must be part of any primary key / unique key of the table.

edited Dec 13, 2012 at 2:55

answered Dec 13, 2012 at 2:46

Bill Karwin

567k87 gold badges710 silver badges870 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

kmoney12 Over a year ago

Thanks Bill! I was actually researching your work and hoping you would answer :) This is embarrassing...but I thought that that's exactly what a clustered index was...an index that included more than one column? No?

Bill Karwin Over a year ago

I'd use the term compound index or composite index for an index with more than one column. A clustered index has a different meaning, referring to the storage of the data rows. InnoDB always uses the primary or unique key as the clustered index. See dev.mysql.com/doc/refman/5.5/en/innodb-index-types.html

kmoney12 Over a year ago

I added the index. I notices that phpmyadmin writes NULL next to some columns because NULL values are allowed for that column (even though none of the rows are actually NULL). Is this bad?

Bill Karwin Over a year ago

Not necessarily bad. Any performance difference between nullable and non-nullable columns is probably insignificant. But it can be a good habit to declare NOT NULL for your columns that should always have some value (and no fair using some special value like -9999 to signify what NULL should be used for).

kmoney12 Over a year ago

Is it okay for me to ask what this Index actually DOES? I kind of get it, but at the same time, don't. Does it make the group of columns unique? So that no 2 rows can share the same file_id, time, and ip?

|

Collectives™ on Stack Overflow

Clustered Index Or Partition Table?

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related