6

my example is on MYSQL VERSION is 5.6.34-log

Problem summary the below query takes 40 seconds, ORDER_ITEM table

  • has 758423 records

    And PAYMENT table

  • has 177272 records

And submission_entry table

  • has 2165698 records

    as A Whole Table count.

DETAILS HERE: BELOW:

  • I Have This Query, Refer to [1]

  • I Have added SQL_NO_CACHE for testing repeated tests when re
    query.

  • I Have Optimized indexes Refer to [2], but no significant
    improvement.

  • Find Table Structures here [3]

  • Find explain plan used [4]

[1]

     SELECT SQL_NO_CACHE
          `payment`.`id`                                                                                    AS id,
          `order_item`.`order_id`                                                                           AS order_id,


          GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
            THEN ' '
                                 ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer,


          event.name                                                                                        AS event,
          COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
            `order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
            payment.make_order_free = 1))
            THEN `order_item`.id
                         ELSE NULL END)                                                                     AS qty,
          payment.currency                                                                                  AS `currency`,
          (SELECT SUM(order_item.sub_total)
           FROM order_item
           WHERE payment_id =
                 payment.id)                                                                                AS sub_total,
          CASE WHEN payment.make_order_free = 1
            THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
          ELSE ROUND(payment.total, 2) END                                                                  AS 'total',
          `payment_type`.`name`                                                                             AS payment_type,
          payment_status.name                                                                               AS status,
          `payment_status`.`id`                                                                             AS status_id,
          DATE_FORMAT(CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
                      '%Y-%m-%d %H:%i')                                                                     AS 'created',
          `user`.`name`                                                                                     AS 'agent',
          event.id                                                                                          AS event_id,
          payment.checked,
          DATE_FORMAT(CONVERT_TZ(payment.checked_date, '+0:00', '-8:00'),
                      '%Y-%m-%d %H:%i')                                                                     AS checked_date,
          DATE_FORMAT(CONVERT_TZ(`payment`.`complete_date`, '+0:00', '-8:00'),
                      '%Y-%m-%d %H:%i')                                                                     AS `complete date`,
          `payment`.`delivery_status`                                                                       AS `delivered`
        FROM `order_item`
          INNER JOIN `payment`
            ON payment.id = `order_item`.`payment_id` AND (payment.status > 0.0 OR payment.status = -3.0)
          LEFT JOIN (SELECT
                       sum(`payment_refund`.total) AS `refunds_total`,
                       payment_refunds.payment_id  AS `payment_id`
                     FROM payment
                       INNER JOIN `payment_refunds` ON payment_refunds.payment_id = payment.id
                       INNER JOIN `payment` AS `payment_refund`
                         ON `payment_refund`.id = `payment_refunds`.payment_id_refund
                     GROUP BY `payment_refunds`.payment_id) AS `refunds` ON `refunds`.payment_id = payment.id
#           INNER JOIN event_date_product ON event_date_product.id = order_item.event_date_product_id
#           INNER JOIN event_date ON event_date.id = event_date_product.event_date_id
          INNER JOIN event ON event.id = order_item.event_id
          INNER JOIN payment_status ON payment_status.id = payment.status
          INNER JOIN payment_type ON payment_type.id = payment.payment_type_id
          LEFT JOIN user ON user.id = payment.completed_by
          LEFT JOIN submission_entry ON submission_entry.form_submission_id = `payment`.`form_submission_id`
          LEFT JOIN question ON question.id = submission_entry.question_id AND question.var IN ('name', 'email')
        WHERE 1 = '1' AND (order_item.status > 0.0 OR order_item.status = -2.0)
        GROUP BY `order_item`.`order_id`
        HAVING 1 = '1'
        ORDER BY `order_item`.`order_id` DESC
        LIMIT 10

[2]

 CREATE INDEX order_id
      ON order_item (order_id);

    CREATE INDEX payment_id
      ON order_item (payment_id);

    CREATE INDEX status
      ON order_item (status);

Second Table

CREATE INDEX payment_type_id
  ON payment (payment_type_id);

CREATE INDEX status
  ON payment (status);

[3]

CREATE TABLE order_item
(
  id                         INT AUTO_INCREMENT
    PRIMARY KEY,
  order_id                   INT                                 NOT NULL,
  form_submission_id         INT                                 NULL,
  status                     DOUBLE DEFAULT '0'                  NULL,
  payment_id                 INT DEFAULT '0'                     NULL
);

SECOND TABLE

CREATE TABLE payment
(
  id                 INT AUTO_INCREMENT,
  payment_type_id    INT                                 NOT NULL,
  status             DOUBLE                              NOT NULL,
  form_submission_id INT                                 NOT NULL,
  PRIMARY KEY (id, payment_type_id)
);

[4] Run the snippet to see the table of EXPLAIN in HTML format

<!DOCTYPE html>
<html>
<head>
  <title></title>
</head>
<body>
<table border="1" style="border-collapse:collapse">
<tr><th>id</th><th>select_type</th><th>table</th><th>type</th><th>possible_keys</th><th>key</th><th>key_len</th><th>ref</th><th>rows</th><th>Extra</th></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_status</td><td>range</td><td>PRIMARY</td><td>PRIMARY</td><td>8</td><td>NULL</td><td>4</td><td>Using where; Using temporary; Using filesort</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment</td><td>ref</td><td>PRIMARY,payment_type_id,status</td><td>status</td><td>8</td><td>exp_live_18092017.payment_status.id</td><td>17357</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_type</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.payment_type_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>user</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.completed_by</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>submission_entry</td><td>ref</td><td>form_submission_id,idx_submission_entry_1</td><td>form_submission_id</td><td>4</td><td>exp_live_18092017.payment.form_submission_id</td><td>2</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>question</td><td>eq_ref</td><td>PRIMARY,var</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.submission_entry.question_id</td><td>1</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>order_item</td><td>ref</td><td>status,payment_id</td><td>payment_id</td><td>5</td><td>exp_live_18092017.payment.id</td><td>3</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>event</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.order_item.event_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>&lt;derived3&gt;</td><td>ref</td><td>key0</td><td>key0</td><td>5</td><td>exp_live_18092017.payment.id</td><td>10</td><td>Using where</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refunds</td><td>index</td><td>payment_id,payment_id_refund</td><td>payment_id</td><td>4</td><td>NULL</td><td>1110</td><td></td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id</td><td>1</td><td>Using index</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refund</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id_refund</td><td>1</td><td></td></tr>
<tr><td>2</td><td>DEPENDENT SUBQUERY</td><td>order_item</td><td>ref</td><td>payment_id</td><td>payment_id</td><td>5</td><td>func</td><td>3</td><td></td></tr></table>
</body>
</html>

Expected Restul

It has to be instead of 40 seconds less than 5

IMPORTANT Updates

1) Reply to comment 1: there is no foreign key at all on those two tables.

UPDATE-1: On local the original query takes 40 seconds if i removed only the following it becomes 25 seconds saves 15 seconds

GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
    THEN ' '
                         ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer

if I removed only its the same time around 40 seconds no save!

COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
    `order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
    payment.make_order_free = 1))
    THEN `order_item`.id
                 ELSE NULL END)                                                                     AS qty,

if I removed only it takes around 36 seconds saves 4 seconds

(SELECT SUM(order_item.sub_total)
   FROM order_item
   WHERE payment_id =
         payment.id)                                                                                AS sub_total,
  CASE WHEN payment.make_order_free = 1
    THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
  ELSE ROUND(payment.total, 2) END                                                                  AS 'total',
25
  • Do you have foreign keys defined on any of the tables? Commented Oct 1, 2017 at 7:57
  • Hi @BenM there is none , on payment and same on order item there is not even single foreign key Commented Oct 1, 2017 at 7:59
  • Probably a good idea to start by adding those, then. Commented Oct 1, 2017 at 7:59
  • 1
    Your query is invalid because you are doing SELECT * with GROUP BY. Please include sample input and output which shows what you are trying to do here. The real query may be substantially different than what you currently have. Commented Oct 1, 2017 at 8:09
  • 2
    Nope. You can only select the order_id column without an aggregate. Even if what you have runs, you should not be relying on it. Commented Oct 1, 2017 at 8:12

4 Answers 4

2

Remove HAVING 1=1; the Optimizer may not be smart enough to ignore it. Please provide EXPLAIN SELECT (not in html) to see what the Optimizer is doing.

It seems wrong to have a composite PK in this case: PRIMARY KEY (id, payment_type_id). Please justify it.

Please explain the meaning of status or the need for DOUBLE: status DOUBLE

It will take some effort to figure out why the query is so slow. Let's start by tossing the normalization parts, such as dates and event name and currency. That is whittle down the query to enough to find the desired rows, but not the details on each row. If it is still slow, let's debug that. If it is 'fast', then add back on the other stuff, one by one, to find out what is causing a performance issue.

Is just id the PRIMARY KEY of each table? Or are there more exceptions (like payment)?

It seems 'wrong' to specify a value for question.var, but then use LEFT to imply that it is optional. Please change all LEFT JOINs to INNER JOINs unless I am mistaken on this issue.

Are any of the tables (perhaps submission_entry and event_date_product) "many-to-many" mapping tables? If so, then follow the tips here to get some performance gains.

When you come back please provide SHOW CREATE TABLE for each table.

Sign up to request clarification or add additional context in comments.

4 Comments

thanks for your time ok 1 - i will provide table structures 2- double status because i have status like 2.1 and 0.1 , i know pk as double is mistake but i can change that now 3 - there is no main need for composite pk in case you refered to 4- submission entry is the main cause especially when doing group concat , i may have submission entry for submission id in payment table and for same record i may have submission_id in order _item table so i have to take them in precendence , meaning payment will always have formsubid but orderitem optional .
why explain is not enough i put it in html format only to be more friendly visible.
the structure of table is very long i may remove some un needed columns is that ok ? y/n , thanks
FYI when i add order_item.event_id on main query its fast , but without specific event it takes ages,
2

Guided by the strategies below,

  • pre-evaluating agregations onto temporary tables
  • placing payment at the top - since this seems to be the most deterministic
  • grouping joins - enforcing to the query optimizer the tables relationship

i present a revised version of your query:

-- -----------------------------------------------------------------------------
-- Summarization of order_item
-- -----------------------------------------------------------------------------

drop temporary table if exists _ord_itm_sub_tot;

create temporary table _ord_itm_sub_tot(
    primary key (payment_id)
)
SELECT
    payment_id,
    --
    COUNT(
        DISTINCT
            CASE
                WHEN(
                        `order_item`.status > 0 OR
                        (
                                `order_item`.status       != -1 AND
                                `order_item`.status       >= -2 AND
                                `payment`.payment_type_id != 8  AND
                                payment.make_order_free = 1
                            )
                    ) THEN `order_item`.id
                      ELSE NULL
            END
    ) AS qty,
    --
    SUM(order_item.sub_total) sub_total
FROM
    order_item
        inner join payment
        on payment.id = order_item.payment_id    
where order_item.status > 0.0 OR order_item.status = -2.0
group by payment_id;

-- -----------------------------------------------------------------------------
-- Summarization of payment_refunds
-- -----------------------------------------------------------------------------

drop temporary table if exists _pay_ref_tot;

create temporary table _pay_ref_tot(
    primary key(payment_id)
)
SELECT
    payment_refunds.payment_id  AS `payment_id`,
    sum(`payment_refund`.total) AS `refunds_total`
FROM
    `payment_refunds`
        INNER JOIN `payment` AS `payment_refund`
        ON `payment_refund`.id = `payment_refunds`.payment_id_refund
GROUP BY `payment_refunds`.payment_id;

-- -----------------------------------------------------------------------------
-- Summarization of submission_entry
-- -----------------------------------------------------------------------------

drop temporary table if exists _sub_ent;

create temporary table _sub_ent(
    primary key(form_submission_id)
)
select 
    submission_entry.form_submission_id,
    GROUP_CONCAT(
        DISTINCT (
            CASE WHEN coalesce(submission_entry.text, '') THEN ' '
                                                          ELSE submission_entry.text
            END
        )
        ORDER BY question.var
        DESC SEPARATOR 0x1D
    ) AS buyer
from 
    submission_entry
        LEFT JOIN question
        ON(
                question.id = submission_entry.question_id
            AND question.var IN ('name', 'email')
        )
group by submission_entry.form_submission_id;

-- -----------------------------------------------------------------------------
-- The result
-- -----------------------------------------------------------------------------

SELECT SQL_NO_CACHE
    `payment`.`id`          AS id,
    `order_item`.`order_id` AS order_id,
    --
    _sub_ent.buyer,
    --
    event.name AS event,
    --
    _ord_itm_sub_tot.qty,
    --
    payment.currency AS `currency`,
    --
    _ord_itm_sub_tot.sub_total,
    --
    CASE
        WHEN payment.make_order_free = 1 THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
                                         ELSE ROUND(payment.total, 2)
    END AS 'total',
    --
    `payment_type`.`name`   AS payment_type,
    `payment_status`.`name` AS status,
    `payment_status`.`id`   AS status_id,
    --
    DATE_FORMAT(
        CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
        '%Y-%m-%d %H:%i'
    ) AS 'created',
    --
    `user`.`name` AS 'agent',
    event.id      AS event_id,
    payment.checked,
    --
    DATE_FORMAT(CONVERT_TZ(payment.checked_date,  '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS checked_date,
    DATE_FORMAT(CONVERT_TZ(payment.complete_date, '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS `complete date`,
    --
    `payment`.`delivery_status` AS `delivered`
FROM
    `payment`
        INNER JOIN(
            `order_item`
                INNER JOIN event
                ON event.id = order_item.event_id
        )
        ON `order_item`.`payment_id` = payment.id
        --
        inner join _ord_itm_sub_tot
        on _ord_itm_sub_tot.payment_id = payment.id
        --
        LEFT JOIN _pay_ref_tot
        on _pay_ref_tot.payment_id = `payment`.id
        --
        INNER JOIN payment_status ON payment_status.id = payment.status
        INNER JOIN payment_type   ON payment_type.id   = payment.payment_type_id
        LEFT  JOIN user           ON user.id           = payment.completed_by
        --
        LEFT JOIN _sub_ent
        on _sub_ent.form_submission_id = `payment`.`form_submission_id`
WHERE
    1 = 1
AND (payment.status > 0.0 OR payment.status = -3.0)
AND (order_item.status > 0.0 OR order_item.status = -2.0)
ORDER BY `order_item`.`order_id` DESC
LIMIT 10

The query from your question present aggregated functions without explicit groupings... this is pretty awkward and in my solution I try to devise aggregations that 'make sense'.

Please, run this version and tell us your findings.

Be, please, very careful not just on the running statistics, but also on the summarization results.

5 Comments

thanks for your time , yes its better but not enough it was 40 seconds and now its 37 seconds , FYI the query works on my sid without set mode my version is MYSQL VERSION is 5.6.34-log , and syntax dose not complain from original query i iposted
@shareef, indeed MySQL just began complainting about 'correct' GROUP BY clauses for aggregation from version 5.7+, hence you did not bounced on syntax errors. Although I shall state aggregation whithout explicit grouping clause indicates misleading on querying aproach and result manipulation. Anyway, hope my revised answer helps you.
i am having error on creating _sub_ent the error is Data truncation: Truncated incorrect DOUBLE value: 'sunday mornign' can you help ? i tried fix no luck.
There's no reference to a column named sunday morning in the queries. Could you please elaborate...
i meant any value in submission entry column text, the idea it causes me exception i googled stackoverflow.com/questions/3456258/… but no idea how to fix !
0

(The tables and query are too complex for me to do the transformation for you. But here are the steps.)

  1. Reformulate the query without any mention of refunds. That is, remove the derived table and the mention of it in the complex CASE.
  2. Debug and time the resulting query. Keep the GROUP BY order_item ORDER BY order_item DESC LIMIT 10 and do any other optimizations already suggested. In particular, get rid of HAVING 1=1 since it is in the way of a likely optimization.
  3. Make the query from step #2 be a 'derived table'...

Something like:

SELECT lots of stuff
    FROM ( query from step 2 ) AS step2
    LEFT JOIN ( ... ) AS refunds  ON step2... = refunds...
    ORDER BY step2.order_item DESC

The ORDER BY is repeated, but neither the GROUP BY, nor the LIMIT need be repeated.

Why? The principle here is...

Currently, it is going into the refunds correlated subquery thousands of times, only to toss it all but 10 times. The reformulation cuts that back to only 10 times.

(Caveat: I may have missed a subtlety preventing this reformulation from working as I presented it. If it does not work, see if you can make the 'principle' help you anyway.)

Comments

0
+50

Here is the minimum you should do each time you see a query with a lot of joins and pagination: you should select those 10 (LIMIT 10) ids that you group by from the first table (order_item) with as minimum joins as possible and then join the ids back to the first table and make all other joins. That way you will not move around in temporary tables all those thousands of columns and rows that you do not need to display.

  1. You look at the inner joins and WHERE conditions, GROUP BYs and ORDER BYs to see whether you need any other tables to filter out rows, group or order ids from the first table. In your case, it doesn't seem you need any joins, except for payment.

  2. Now you write the query to select those ids:

    SELECT o.order_id, o.payment_id
    FROM order_item o
    JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
    WHERE order_item.status > 0.0 OR order_item.status = -2.0
    ORDER BY order_id DESC
    LIMIT 10
    

    If there might be several payments for a single order, you should use GROUP BY order_id DESC instead of ORDER BY. To make the query work quicker you need a BTREE index on status column for order_item table, or even a composite index on (status, payment_id).

  3. Now, when you are sure that the ids are those that you expected, you make all other joins:

    SELECT order_item.order_id,
      `payment`.`id`,
      GROUP_CONCAT ... -- and so on from the original query
    FROM (
      SELECT o.order_id, o.payment_id
      FROM order_item o
      JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
      WHERE order_item.status > 0.0 OR order_item.status = -2.0
      ORDER BY order_id DESC
      LIMIT 10
    ) as ids
    JOIN order_item ON ids.order_id = order_item.order_id
    JOIN payment ON ids.payment_id = payment.id
    LEFT JOIN ( ... -- and so on
    

The idea is that you significantly lower the temporary tables you need to process. Now every row selected by the joins will be used in the result set.


UPD1: Another thing is that you should simplify the aggregation in LEFT JOIN:

SELECT
  sum(payment.total) AS `refunds_total`,
  refs.payment_id  AS `payment_id`
FROM payment_refunds refs
JOIN payment ON payment.id = refs.payment_id_refund
GROUP BY refs.payment_id

or even replace the LEFT JOIN with a correlated subquery, since the correlation will be executed only for those 10 rows (make sure, you use this whole query with three columns as the subquery, otherwise, the correlation will be computed for each row in the resulting join before the GROUP BY):

SELECT
      ids.order_id,
      ids.payment_id,
      (SELECT SUM(p.total) 
       FROM payment_refunds refs 
       JOIN payment p 
         ON refs.payment_id_refund = p.id
       WHERE refs.payment_id = ids.payment_id
       ) as refunds_total
    FROM (
      SELECT o.order_id, o.payment_id
      FROM order_item o
      JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
      WHERE order_item.status > 0.0 OR order_item.status = -2.0
      ORDER BY order_id DESC
      LIMIT 10
    ) as ids

You will also need to an index (payment_id, payment_id_refund) on payment_refunds and you can even try a covering index (payment_id, total) on payment as well.

5 Comments

good thinking but i have other variable in formula, am using jqgrid and we have filters there exampel: i wana be able to filter on submission entry to view the orderitems related to that would that be still doable using ur suggestion ? another example i may search on qty or total my main problem is to search upon dynamic submission entry vars and answers ,please need ur help
@shareef, yes, sure, the idea should be the same: you should take the minimum number of tables to get the final ids, and then join with the rest of the tables to get the data by the ids.
ok i will give it a try later, and due to bounty i shall re open it because i really need help here, i will provide sql fiddle of my working draft later thanks., although i think there will be over complicating when i wana implement the wheres and having with jq grid in back end
i wish it was that easy : 1- order table and order item has one to many relation so your query limited to order item to 10 which showed me 2 orders having those 10 tickets. 2- the where statment if i want to select order items with certain where for payment.status or submission entry answer for name or email it want work . summarry i will end repeating the same query inside other one ! can you help me here , may be idea for caching or search in memory ! but i dont think we need go that complex
@shareef, do not give up. 1) as I have written in my answer, if you have one-to-many relation, replace ORDER BY DESC with GROUP BY DESC. 2) The idea is to write a base query that selects only ids. Selecting ids usually does not require a lot of memory and allows you to adjust the correct indexes. When you have that, fetching the rest of the data by joining on primary keys for this resulting set is quick by default - this is what primary keys are for. The inner sub-query fetching ids might be complex, but the less data is involved at that moment, the faster it works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.