I have a couple of tables that looks like this:
CREATE TABLE Entities (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
client_id INT NOT NULL,
display_name VARCHAR(45),
PRIMARY KEY (id)
)
CREATE TABLE Statuses (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE EventTypes (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE Events (
id INT NOT NULL AUTO_INCREMENT,
entity_id INT NOT NULL,
date DATE NOT NULL,
event_type_id INT NOT NULL,
status_id INT NOT NULL
)
Events is large > 100,000,000 rows
Entities, Statuses and EventTypes are small < 300 rows a piece
I have several indexes on Events, but the ones that come into play are
idx_events_date_ent_status_type (date, entity_id, status_id, event_type_id)
idx_events_date_ent_status_type (entity_id, status_id, event_type_id)
idx_events_date_ent_type (date, entity_id, event_type_id)
I have a large complicated query, but I'm getting the same slow query results with a simpler one like the one below (note, in the real queries, I don't use evt.*)
SELECT evt.*, ent.name AS ent_name, s.name AS stat_name, et.name AS type_name
FROM `Events` evt
JOIN `Entities` ent ON evt.entity_id = ent.id
JOIN `EventTypes` et ON evt.event_type_id = et.id
JOIN `Statuses` s ON evt.status_id = s.id
WHERE
evt.date BETWEEN @start_date AND @end_date AND
evt.entity_id IN ( 19 ) AND -- this in clause is built by code
evt.event_type_id = @type_id
For some reason, mysql keeps choosing the index which doesn't cover Events.date and the query takes 15 seconds or more and returns a couple thousand rows. If I change the query to:
SELECT evt.*, ent.name AS ent_name, s.name AS stat_name, et.name AS type_name
FROM `Events` evt force index (idx_events_date_ent_status_type)
JOIN `Entities` ent ON evt.entity_id = ent.id
JOIN `EventTypes` et ON evt.event_type_id = et.id
JOIN `Statuses` s ON evt.status_id = s.id
WHERE
evt.date BETWEEN @start_date AND @end_date AND
evt.entity_id IN ( 19 ) AND -- this in clause is built by code
evt.event_type_id = @type_id
The query takes .014 seconds.
Since this query is built by code, I would much rather not force the index, but mostly, I want to know why it chooses one index over the other. Is it because of the joins?
To give some stats, there are ~2500 distinct dates, and ~200 entities in the Events table. So I suppose that might be why it chooses the index with all of the low cardinality columns.
Do you think it would help to add date to the end of idx_events_date_ent_status_type? Since this is a large table, it takes a long time to add indexes.
I tried adding an additional index, ix_events_ent_date_status_et(entity_id, date, status_id, event_type_id) and it actually made the queries slower.
I will experiment a bit more, but I feel like I'm not sure how the optimizer makes it's decisions.
Additional Info:
I tried removing the join to the Statuses table, and mysql switches to ix_events_date_ent_type, and the query runs in 0.045 sec
I can't wrap my head around why removing a join to a table that is not part of the filter impacts the choice of index.
start_dateandend_date? If that is "a lot", then MySQL will decide that index is not to be used. When selecting just 1 day (start_date=end_date), or a couple of days, then MySQL might decide to use the index after allstatus_idis in the index which you force to be used, but no filtering is done on that field. This is also a reason for NOT selecting that index.status_idhas been selected (evt.*), and need to be fetched anyways, There's not a real reason for using an index on that field.