0

The below query is taking almost 14 secs to complete. I have a person table with 1M entries in it. Can anyone suggest me how to make the query faster and reduce the execution time to say something like 1, 2 or 3 seconds ? I am attaching the explain details below.

SELECT p.id, 
COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =     DATE_FORMAT(NOW(),'%Y-%m-%d') THEN 1 END) AS today_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =    DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 DAY), '%Y-%m-%d') THEN 1 END) AS     yesterday_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN    DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END)         AS month_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN     DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS     year_install_count, 
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN     DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS     week_install_count, 
COUNT('x') AS total_users_count
FROM person p 
WHERE p.app_id IN (SELECT p2.id FROM project p2 ) GROUP BY p.app_id

239 row(s) returned

Execution Time : 13.504 sec Transfer Time : 0.001 sec Total Time : 13.505 sec

shw create table for person and project

person  CREATE TABLE `person` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `device_push_token` longtext NOT NULL,
  `created_date` datetime NOT NULL,
  `since_last_login` datetime NOT NULL,
  `platform` smallint(6) NOT NULL,
  `hwid` varchar(255) NOT NULL,
  `app_id` bigint(20) NOT NULL,
  `since_last_push` datetime NOT NULL,
  `no_of_pushes` smallint(6) NOT NULL DEFAULT '0',
  `language` varchar(50) DEFAULT NULL,
  `timezone` bigint(20) DEFAULT '0',
  `since_last_hour_push` datetime DEFAULT NULL,
  `version` bigint(20) NOT NULL DEFAULT '1',
  `active` tinyint(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `hwid` (`hwid`,`app_id`),
  KEY `fk_person_platform` (`platform`),
  KEY `fk_person_project` (`app_id`),
  CONSTRAINT `fk_person_platform` FOREIGN KEY (`platform`) REFERENCES `platform` (`id`),
  CONSTRAINT `fk_person_project` FOREIGN KEY (`app_id`) REFERENCES `project` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1310384 DEFAULT CHARSET=latin1


project CREATE TABLE `project` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `unique_id` varchar(300) NOT NULL,
  `name` longtext NOT NULL,
  `description` longtext,
  `ios_configure` bigint(20) DEFAULT NULL,
  `android_configure` bigint(20) DEFAULT NULL,
  `freq_push` bigint(20) DEFAULT NULL,
  `hour_push` bigint(20) DEFAULT NULL,
  `push_sent` bigint(20) DEFAULT '0',
  `push_opened` bigint(20) DEFAULT '0',
  `version` bigint(20) NOT NULL DEFAULT '1',
  `created_date` datetime NOT NULL,
  `updated_date` datetime NOT NULL,
  `active` tinyint(1) NOT NULL DEFAULT '1',
  `project_apprater` bigint(20) DEFAULT NULL,
  `type` smallint(6) NOT NULL DEFAULT '1',
  `status` bigint(20) DEFAULT '1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_id` (`unique_id`),
  KEY `fk_project_ios_config` (`ios_configure`),
  KEY `fk_project_android_config` (`android_configure`),
  KEY `fk_project_freq_push` (`freq_push`),
  KEY `fk_project_hour_push` (`hour_push`),
  KEY `fk_project_apprater` (`project_apprater`),
  KEY `fk_project_platform` (`type`),
  KEY `name` (`status`),
  CONSTRAINT `fk_project_android_config` FOREIGN KEY (`android_configure`) REFERENCES `project_configure_android` (`id`),
  CONSTRAINT `fk_project_apprater` FOREIGN KEY (`project_apprater`) REFERENCES `project_apprater` (`id`),
  CONSTRAINT `fk_project_freq_push` FOREIGN KEY (`freq_push`) REFERENCES `freq_push` (`id`),
  CONSTRAINT `fk_project_hour_push` FOREIGN KEY (`hour_push`) REFERENCES `hour_push` (`id`),
  CONSTRAINT `fk_project_ios_config` FOREIGN KEY (`ios_configure`) REFERENCES `project_configure_ios` (`id`),
  CONSTRAINT `fk_project_platform` FOREIGN KEY (`type`) REFERENCES `platform` (`id`),
  CONSTRAINT `name` FOREIGN KEY (`status`) REFERENCES `project_status` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=313 DEFAULT CHARSET=latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   PRIMARY p   index   \N  fk_person_project   8   \N  1158770 Using where
2   DEPENDENT SUBQUERY  p2  unique_subquery PRIMARY PRIMARY 8   func    1   Using index

Updated Full Query

SELECT 
  p3.id AS id,
  COALESCE(pug.active_users_count, 0) AS userCount, 
  p3.unique_id AS uniqueId,
  p3.name,
  p3.description,
  DATE_FORMAT(p3.created_date, '%m-%d-%Y %T') AS createdDate,
  p3.android_configure AS androidConfigure,
  p3.ios_configure AS iosConfigure,
  (SELECT 
    fp.active 
  FROM
    freq_push fp 
  WHERE fp.id = p3.freq_push) AS freqActive,
  (SELECT 
    hp.active 
  FROM
    hour_push hp 
  WHERE hp.id = p3.hour_push) AS hourActive,
  COALESCE(pug.total_users_count, 0) AS totalUserCount,
  COALESCE(pug.today_install_count, 0) AS todayInstallCount,
  COALESCE(pug.yesterday_install_count, 0) AS yesterdayInstallCount,
  COALESCE(pug.month_install_count, 0) AS monthInstallCount,
  COALESCE(pug.year_install_count, 0) AS yearInstallCount,
  COALESCE(pug.week_install_count, 0) AS weekInstallCount,
  (SELECT 
    plat.name 
  FROM
    platform plat 
  WHERE plat.id = p3.type) AS project_type ,
  ps.name
FROM 
  (SELECT 
    p.app_id,
    COUNT(
      CASE
        WHEN p.active = 1 
        THEN 1 
      END) AS active_users_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) = CURDATE() 
        THEN 1 
      END
    ) AS today_install_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) = DATE(DATE_SUB(NOW(), INTERVAL 1 DAY)) 
        THEN 1 
      END
    ) AS yesterday_install_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) BETWEEN DATE_SUB(
          CURDATE(),
          INTERVAL DAY(LAST_DAY(NOW())) DAY
        ) 
        AND CURDATE() 
        THEN 1 
      END
    ) AS month_install_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) BETWEEN DATE_SUB(CURDATE(), INTERVAL 1 YEAR) 
        AND CURDATE() 
        THEN 1 
      END
    ) AS year_install_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) BETWEEN DATE_SUB(CURDATE(), INTERVAL 7 DAY) 
        AND CURDATE() 
        THEN 1 
      END
    ) AS week_install_count,
    COUNT('x') AS total_users_count 
  FROM
    person p 
    INNER JOIN project p2 
      ON p.app_id = p2.id 
  GROUP BY p.app_id) AS pug 
  RIGHT JOIN project p3 
    ON p3.id = pug.app_id     
  INNER JOIN project_status ps  
  ON p3.status = ps.id
ORDER BY userCount DESC,
  createdDate DESC
6
  • 1
    show output from show create table person and output from show create table project Commented Nov 2, 2015 at 16:17
  • 1
    or just do a join and no correlated subquery Commented Nov 2, 2015 at 16:18
  • you're comparing derived values. there is NOTHING you can do to optimize this until you start using bare indexed "original" values. plus, why date_format(...) to convert a date/time value from datetime->string->date, when you could simply have date() and go datetime->date directly? Commented Nov 2, 2015 at 16:18
  • where were those "I am attaching the explain details below". Here is how to Share those, seen under General Comments and sharing Commented Nov 2, 2015 at 16:24
  • Hi Drew, Pasted show create table person and project. Please check Commented Nov 3, 2015 at 7:22

5 Answers 5

1

Revamping this Answer from previous version. Basically Mark B is right in comments under question. Fortunately progress has been made by the OP, and the time has been reduced from say 13 seconds to a bit under 6. The OP said (in comments under his own Answer and in chat) that if the time could be reduced to under 1 second, he would consider other approaches. Like the one I was talking about with him concerning accepting somewhat stale metrics where he gets to choose how long the stale duration lasts. A trade-off between staleness and speed for user.

So here is one approach to that.

One uses Create Event to create an event that fires automatically every nnn (time period) Interval of his choosing. That event updates a table that his end-users access. The event itself runs his query from his Answer that you will see embedded in the event below.

Schema Change

create table appIdMetrics
(   -- this is the table Users hit against
    appId int not null primary key,
    active_users_count int not null,
    today_install_count int not null,
    yesterday_install_count int not null,
    month_install_count int not null,
    year_install_count int not null,
    week_install_count int not null,
    total_users_count int not null
);

create table evt_appIdMetrics
(   -- this is the worktable that only the Event uses
    -- while it puts together the refreshed data
    -- perhaps once every 5 minutes
    appId int not null primary key,
    active_users_count int not null,
    today_install_count int not null,
    yesterday_install_count int not null,
    month_install_count int not null,
    year_install_count int not null,
    week_install_count int not null,
    total_users_count int not null
);

Event Creation

drop event updateAppIdMetrics;
DELIMITER $$
CREATE EVENT updateAppIdMetrics
    ON SCHEDULE
        EVERY 5 MINUTE

DO BEGIN
    truncate table evt_appIdMetrics;    -- this is the table that only the evt has access to

    -- time to refresh this table (approx 6 seconds)
    -- 280 rows (count as per OP comments)
    insert into evt_appIdMetrics
    (appId,active_users_count,today_install_count,yesterday_install_count,
    month_install_count,year_install_count,week_install_count,total_users_count)
    select p.app_id, 
    COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
    COUNT(CASE WHEN DATE(p.created_date)= CURDATE() THEN 1 END) AS today_install_count,  
    COUNT(CASE WHEN DATE(p.created_date) = DATE(DATE_SUB(NOW(),INTERVAL 1 DAY)) THEN 1 END) AS     yesterday_install_count,  
    COUNT(CASE WHEN DATE(p.created_date) BETWEEN DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END)         AS month_install_count,  
    COUNT(CASE WHEN DATE(p.created_date) BETWEEN     DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS     year_install_count, 
    COUNT(CASE WHEN DATE(p.created_date) BETWEEN     DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS     week_install_count, 
    COUNT('x') AS total_users_count
    FROM person p 
    INNER JOIN project p2 ON p.app_id = p2.id 
    GROUP BY p.app_id;

    -- BEGIN LOCK (important)
    -- figure out a locking scheme (work-in-progress, not completed yet)
    truncate table appIdMetrics;    -- this is the table users access

    -- the following should take a split second on the approximately 280 rows (count as per OP comments)
    insert into appIdMetrics
    (appId,active_users_count,today_install_count,yesterday_install_count,
    month_install_count,year_install_count,week_install_count,total_users_count)
    select appId,active_users_count,today_install_count,yesterday_install_count,
    month_install_count,year_install_count,week_install_count,total_users_count
    from evt_appIdMetrics;
    -- complete locking schema (work-in-progress, not completed yet)
    -- END LOCK (important)
END;$$
DELIMITER ;
-- evt creation succeeded by passing Syntax Error check

Users interact with table appIdMetrics. When I get a chance I will tweak the locking scheme mentioned. The UX for the user should be a split second. Data refresh interval is adjustable by OP for stale factor. From my experience the event will fire for the first time after the first time period interval. So that means 5 minutes.

I will provide a link for Event management in a little bit. Edit: here it is. Events must be enabled.

Sign up to request clarification or add additional context in comments.

8 Comments

I will try this and let you know
Even after creating this covering index ( ALTER TABLE person ADD INDEX app_id_created_date_id ( app_id,created_date,id ) ) and with inner query, no improvement. Its still taking 10 secs
Drew, please see my answer, Replacing DATE_FORMAT() with DATE() ended the query execution time in 5 secs. Could you explain why there is a drastic improvement in performance?
the runtime optimation a.k.a. O(n) of date() is superior to date_format() due to less functionality in the former, but just enough functionality to make it right
Drew, Lock and Unlock are not allowed in stored procedures. what to do ?dev.mysql.com/doc/refman/5.7/en/…
|
0

Maybe you can try to join the table. But I'm not sure this can reduce the execution time until 3 seconds.

 SELECT p.id, 
 COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =     DATE_FORMAT(NOW(),'%Y-%m-%d') THEN 1 END) AS today_install_count,  
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =         DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 DAY), '%Y-%m-%d') THEN 1 END) AS     yesterday_install_count,  
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN    DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END)         AS month_install_count,  
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN     DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS     year_install_count, 
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN     DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS     week_install_count, 
 COUNT('x') AS total_users_count
 FROM person p 
 INNER JOIN project p2 ON p.app_id = p2.id 
 GROUP BY p.app_id

1 Comment

this query is taking 10 secs Wilianto. 3 secs improvement
0

IN this query what I understood is, you need all the counts from last one year and not bothered about very old data. In that case,if you have a project_date in your project table then you can limit ids in sub query, this may help to perform much better than the old one.

SELECT p.id, 
COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =DATE_FORMAT(NOW(),'%Y-%m-%d') THEN 1 END) AS today_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 DAY), '%Y-%m-%d') THEN 1 END) AS yesterday_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END) AS month_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS year_install_count, 
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS week_install_count
FROM person p 
WHERE p.app_id IN (SELECT p2.id FROM project p2 AND p2.project_date > DATE_SUB(CURDATE(),INTERVAL 1 YEAR)) 
GROUP BY p.app_id;

Now You can take the total count of each project Id seperately and merge with the above counts.

SELECT p.id, COUNT('x') AS total_users_count 
FROM person p 
WHERE p.app_id IN (SELECT p2.id FROM project p2) 
GROUP BY p.app_id;

Hope this will help.

Comments

0

The below query execution time reduced to 5 secs. Can you guys explain why changing from DATE_FORMAT() TO DATE() gives this drastic improvement ?

SELECT p.app_id, 
 COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
 COUNT(CASE WHEN DATE(p.created_date)= CURDATE() THEN 1 END) AS today_install_count,  
 COUNT(CASE WHEN DATE(p.created_date) = DATE(DATE_SUB(NOW(),INTERVAL 1 DAY)) THEN 1 END) AS     yesterday_install_count,  
 COUNT(CASE WHEN DATE(p.created_date) BETWEEN DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END)         AS month_install_count,  
 COUNT(CASE WHEN DATE(p.created_date) BETWEEN     DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS     year_install_count, 
 COUNT(CASE WHEN DATE(p.created_date) BETWEEN     DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS     week_install_count, 
 COUNT('x') AS total_users_count
 FROM person p 
 INNER JOIN project p2 ON p.app_id = p2.id 
 GROUP BY p.app_id

8 Comments

the runtime optimation a.k.a. O(n) of date() is superior to date_format() due to less functionality in the former, but just enough functionality to make it right
I want to make it below 2 secs. creating a covering index like (created_date, app_id, id ) didn't help. I thought it would give improvement, but no improvement at all even after creating this index. Do you know why ?
I want to group by application identifier and not by person identifier. I want to get the count of people belongs to each application. That's why app_id.
Okay, sorry, it is actually SELECT p.app_id, and not id
Please see the updated full query. Its taking 5.79 secs....Please check if there is anything wrong in it. Is it possible to make any more improvement in that query ? could it be possible to make it less than 3 secs ?
|
0

Another optimization thought... In a subquery, compute once how many days ago created_date, as an integer. Then in the outer query do the more efficient

age <= 365 AS year_install_count,
age <=   7 AS weel_install_count,
...

Note that x <= y is a "Boolean", which displays as "1" for true and "0" for false. So, no need for repeated DATE_SUB, DATE_FORMAT or later COALESCE, etc.

For getting age, try DATEDIFF(created_date, CURRDATE) or play with TO_DAYS(CURRDATE()) - TO_DAYS(CURRDATE()). Caveat: it may be off by 1.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.