0

I am trying to run the following query on a very large table with over 90 million of rows increasing

SELECT COUNT(DISTINCT device_uid) AS cnt,  DATE_FORMAT(time_start, '%Y-%m-%d') AS period 
FROM game_session 
WHERE account_id = -2 AND DATE_FORMAT(time_start '%Y-%m-%d') BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()
GROUP BY period 
ORDER BY period DESC

I have the following table structure:

CREATE TABLE `game_session` (
  `session_id` bigint(20) NOT NULL,
  `account_id` bigint(20) NOT NULL,
  `authentification_type` char(2) NOT NULL,
  `source_ip` char(40) NOT NULL,
  `device` char(50) DEFAULT NULL COMMENT 'Added 0.9',
  `device_uid` char(50) NOT NULL,
  `os` char(50) DEFAULT NULL COMMENT 'Added 0.9',
  `carrier` char(50) DEFAULT NULL COMMENT 'Added 0.9',
  `protocol_version` char(20) DEFAULT NULL COMMENT 'Added 0.9',
  `lang_key` char(2) NOT NULL DEFAULT 'en',
  `instance_id` char(100) NOT NULL,
  `time_start` datetime NOT NULL,
  `time_end` datetime DEFAULT NULL,
  PRIMARY KEY (`session_id`),
  KEY `game_account_session_fk` (`account_id`),
  KEY `lang_key_fk` (`lang_key`),
  KEY `lookup_active_session_idx` (`account_id`,`time_start`),
  KEY `lookup_finished_session_idx` (`account_id`,`time_end`),
  KEY `start_time_idx` (`time_start`),
  KEY `lookup_guest_session_idx` (`device_uid`,`time_start`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

How can I optimize this?

Thank for your answer

3
  • 1
    Any reason you're doing the DATE_FORMAT ? Commented Dec 2, 2011 at 20:56
  • 1
    There is lot of guessing going on here. You might get some more accurate answers if you can post some EXPLAIN results. Commented Dec 2, 2011 at 21:05
  • @Albin: There is not much to guess. WHERE DATE_FORMAT(time_start, '%Y-%m-%d') BETWEEN x AND y and GROUP BY DATE_FORMAT(time_start, '%Y-%m-%d') will kill a 90M rows query, no matter what indexes you have. Commented Dec 2, 2011 at 21:23

6 Answers 6

3

DATE_FORMAT(time_start '%Y-%m-%d') sounds expensive.
Every calculation on a column reduces the use of indexes. You probably run in to a full index scan + calculation of DATE_FORMAT for each value instead of a index lookup / range scan.

Try to store the computed value in the column (or create a computed index if mysql supports it). Or even better rewrite your conditions to compare directly to the value stored in the column.

Sign up to request clarification or add additional context in comments.

5 Comments

There's no need to store computed value in the column, it's easier to manipulate the values it is compared to.
@MichaelKrelin-hacker That is of course the best if possible, but I assume the DATE_FORMAT is used to truncate time-part from the column. How would you do that by manipulation of the values it is compared to?
Truncate the lower bound and bump the higher to the next day?
@MichaelKrelin-hacker, Ah, you´re right. I'm used to SQL Server where GETDATE() returns date including time of day, but now I see that CURDATE() only returns the datepart. Then I agree with you: away with the DATE_FORMAT.
Oh, then it's even easier, then you don't have to truncate the lower bound :) What I've proposed would actually work for full timestamp — I'm not a frequent mysql user either (though not sql server user) so I considered all possibilities :)
2

Well, 90mlns is a lot, but I suspect it doesn't use the start_time_idx because of the manipulations, which you can avoid (you can manipulate the values you compare it with with, it also must be done only once per query if mysql is smart enough), have you checked EXPLAIN?

2 Comments

You're wright, it doesn't use the start_time_idx, it uses lookup_finished_session_idx and I can' t figure out why
I've just told you why :) Try avoiding formatting, see my comment to Albin's answer on how.
1

You may want to group and sort by time_start instead of the period value you create when the query is run. Sorting by period requires all of those values to be generated before any sorting can be done.

1 Comment

By this time, I think there are less records left after filtering by where condition. But of course having some more input on the data and outcome (as well as intermediate count) would be helpful.
1

I'd change

BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()

to

> (CURDATE() - INTERVAL 90 DAY)

You don't have records from future, do you?

Comments

1

Try swapping out your WHERE clause with the following: WHERE account_id = -2 AND time_start BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()

MySQL will still catch the dates between, the only ones you'll need to worry about are the ones from today, which might get truncated due to technically being greater than midnight.

You can fix that by incrementing the second CURDATE( ) with CURDATE( ) + INTERVAL 1 DAY

1 Comment

Correct but don't use BETWEEN. Use >= and < CURDATE( ) + INTERVAL 1 DAY
1

Change the query to:

SELECT COUNT(DISTINCT device_uid) AS cnt
     , DATE_FORMAT(time_start, '%Y-%m-%d') AS period 
FROM game_session 
WHERE account_id = -2 
  AND time_start >= CURDATE() - INTERVAL 90 DAY 
  AND time_start <  CURDATE() + INTERVAL 1 DAY
GROUP BY DATE(time_start) DESC

so the index of (account_id, time_start) can be used for the WHERE part of the query.


If it's still slow - the DATE(time_start) does not look very good for performance - add a date_start column and store the date part of time_start.

Then add an index on (account_id, date_start, device_uid) which will further improve performance as all necessary info - for the GROUP BY date_start and the COUNT(DISTINCT device_uid) parts - will be on the index:

SELECT COUNT(DISTINCT device_uid) AS cnt
     , date_start                 AS period 
FROM game_session 
WHERE account_id = -2 
  AND date_start BETWEEN CURDATE() - INTERVAL 90 DAY 
                     AND CURDATE()
GROUP BY date_start DESC

1 Comment

Thank you I' m going to try that

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.