Optimise MySQL - JOIN vs Nested query

Question

I have been trying to optimise some SQL queries based on the assumption that Joining tables is more efficient than nesting queries. I am joining the same table multiple times to perform a different analysis on the data.

I have 2 tables:

transactions:

id    |   date_add    |   merchant_ id    | transaction_type      |     amount
1         1488733332          108                  add                     20.00
2         1488733550          108                 remove                   5.00

and a calendar table which just lists dates so that I can create empty records where there are no transactions on particular days:

calendar:

id     |    datefield
1           2017-03-01
2           2017-03-02
3           2017-03-03
4           2017-03-04

I have many thousands of rows in the transactions table, and I'm trying to get an annual summary of total and different types of transactions per month (i.e 12 rows in total), where

transactions = sum of all "amount"s,
additions = sum of all "amounts" where transaction_type = "add"
redemptions = sum of all "amounts" where transaction_type = "remove"

result:

month     |    transactions     |    additions  |   redemptions
Jan                15                  12               3
Feb                20                  15               5
...

My initial query looks like this:

SELECT  COALESCE(tr.transactions, 0) AS transactions, 
        COALESCE(ad.additions, 0) AS additions, 
        COALESCE(re.redemptions, 0) AS redemptions, 
        calendar.date 
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date FROM calendar WHERE datefield LIKE '2017-%' GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar 
LEFT JOIN (SELECT COUNT(transaction_type) as transactions, from_unixtime(date_add, '%b %Y') as date_t FROM transactions WHERE merchant_id = 108  GROUP BY from_unixtime(date_add, '%b %Y')) AS tr
ON calendar.date = tr.date_t
LEFT JOIN (SELECT COUNT(transaction_type = 'add') as additions, from_unixtime(date_add, '%b %Y') as date_a FROM transactions WHERE merchant_id = 108  AND transaction_type = 'add' GROUP BY from_unixtime(date_add, '%b %Y')) AS ad
ON calendar.date = ad.date_a
LEFT JOIN (SELECT COUNT(transaction_type = 'remove') as redemptions, from_unixtime(date_add, '%b %Y') as date_r FROM transactions WHERE merchant_id = 108  AND transaction_type = 'remove' GROUP BY from_unixtime(date_add, '%b %Y')) AS re
ON calendar.date = re.date_r

I tried optimising and cleaning it up a little, removing the nested statements and came up with this:

SELECT 
    DATE_FORMAT(cal.datefield, '%b %d') as date,
    IFNULL(count(ct.amount),0) as transactions, 
    IFNULL(count(a.amount),0) as additions, 
    IFNULL(count(r.amount),0) as redeptions
FROM calendar as cal 
LEFT JOIN transactions as ct ON cal.datefield = date(from_unixtime(ct.date_add))  && ct.merchant_id = 108
LEFT JOIN transactions as r ON r.id = ct.id && r.transaction_type = 'remove'
LEFT JOIN transactions as a ON a.id = ct.id && a.transaction_type = 'add' 
WHERE cal.datefield like '2017-%'
GROUP BY month(cal.datefield)

I was surprised to see that the revised statement was about 20x slower than the original with my dataset. Have I missed some sort of logic? Is there a better way to achieve the same result with a more streamlined query, given I am joining the same table multiple times?

EDIT: So to further explain the results I'm looking for - I'd like a single row for each month of the year (12 rows) each with a column for the total transactions, total additions, and total redemptions in each month.

The first query I was getting a result in about 0.5 sec but with the second I was getting results in 9.5sec.

Can you add an explain and post the results for the optimized and the non-optimized queries? — Dan Ionescu
– Dan Ionescu, Commented Mar 5, 2017 at 17:21
I see you used && in the second query in the ON statements off the LEFT JOIN? They should be AND — Raymond Nijland
– Raymond Nijland, Commented Mar 5, 2017 at 17:32
Could you explain the columns in the result set? I believe this task can be solved in a less complex way. — Paul Spiegel
– Paul Spiegel, Commented Mar 5, 2017 at 17:33
@PaulSpiegel I added a couple of edits there to explain the expected results — contool
– contool, Commented Mar 5, 2017 at 17:38

Mihai · Accepted Answer · 2017-03-05 17:41:20Z

3

Looking to your query You could use a single left join with case when

SELECT  COALESCE(t.transactions, 0) AS transactions, 
        COALESCE(t.additions, 0) AS additions, 
        COALESCE(t.redemptions, 0) AS redemptions, 
        calendar.date 
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date 
          FROM calendar 
          WHERE datefield LIKE '2017-%' 
          GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar 
LEFT JOIN 
 ( select 
      COUNT(transaction_type) as transactions
      , sum( case when transaction_type = 'add' then 1 else 0 end ) as additions
      , sum( case when transaction_type = 'remove' then 1 else 0 end ) as redemptions
      ,  from_unixtime(date_add, '%b %Y') as date_t 
      FROM transactions 
      WHERE merchant_id = 108  
      GROUP BY from_unixtime(date_add, '%b %Y' ) t ON calendar.date = t.date_t

edited Mar 5, 2017 at 17:41

Mihai

26.8k8 gold badges71 silver badges87 bronze badges

answered Mar 5, 2017 at 17:34

ScaisEdge

133k10 gold badges98 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

contool Over a year ago

Thanks @scaisEdge the CASE WHEN is a revelation - muchos gracias! That's down from 0.5 sec to 0.2

Rick James Over a year ago

The first SUM can be simplified to SUM(transaction_type = 'add').

Rick James Over a year ago

If there is an index on datefield, datefield LIKE '2017-%' can be sped up by datefield >= '2017-01-01' AND datefield < '2017-01-01' + INTERVAL 1 YEAR.

Rick James Over a year ago

I'm pretty sure the COALESCE is unnecessary -- SUM( nothing matches ) is 0 anyway.

ScaisEdge Over a year ago

@RickJames I have refactored the left join .. letting the main select as done by th OP ..

|

Paul Spiegel · Accepted Answer · 2017-03-05 18:25:46Z

0

First I would create a derived table with timestamp ranges for every month from your calendar table. This way a join with the transactions table will be efficient if date_add is indexed.

select month(c.datefield) as month, 
       unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
       unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
from calendar c
where c.datefield between '2017-01-01' and '2017-12-31'
group by month(c.datefield)

Join it with the transaactions table and use conditional aggregations to get your data:

select c.month,
       sum(t.amount) as transactions,
       sum(case when t.transaction_type = 'add'    then t.amount else 0 end) as additions,
       sum(case when t.transaction_type = 'remove' then t.amount else 0 end) as redemptions
from (
    select month(c.datefield) as m, date_format(c.datefield, '%b') as `month`
           unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
           unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
    from calendar c
    where c.datefield between '2017-01-01' and '2017-12-31'
    group by month(c.datefield), date_format(c.datefield, '%b')
) c
left join transactions t on t.date_add between c.ts_from and c.ts_to
where t.merchant_id = 108
group by c.m, c.month
order by c.m

edited Mar 5, 2017 at 18:25

answered Mar 5, 2017 at 18:11

Paul Spiegel

31.9k5 gold badges48 silver badges56 bronze badges

2 Comments

Rick James Over a year ago

It is likely to be some faster if you make the ORDER BY match the GROUP BY. This may avoid an extra sort.

Paul Spiegel Over a year ago

@RickJames You think the optimizer isn't able to "see" that ORDER BY is a subset of GROUP BY? However it doesn't really matter here since the resultset contains only 12 rows. I have another problem here. This query takes 2 seconds on 1M-row-table (~330K rows for 2014). But changing to inner join it runs in 100 msec. So i would need to put the code into another subquery. MySQL optimizer is sometimes really stupid.

Collectives™ on Stack Overflow

Optimise MySQL - JOIN vs Nested query

2 Answers 2

6 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related