1

I have two SQL queries where uses an inner join first to match based on a condition, and the other does not. Ultimately, I would like the difference between the columns created by each query. How can I do this?

I have tried unioning and joining the queries as in some similar posts, but it won't work. I wonder if the issue is around the joins within each query.

Query 1 :

SELECT A.date, COUNT(DISTINCT A.id)
FROM A
INNER JOIN B
ON A.id = B.id AND A.date = B.date
AND B.col1 = 'value1'
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2'
AND C.category = 'cat1'
GROUP BY 1
ORDER BY 1 DESC

Query 2 :

SELECT A.date, COUNT(DISTINCT A.id)
FROM A
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2'
AND C.category = 'cat1'
GROUP BY 1
ORDER BY 1 DESC 
3
  • 1
    Can you define what you mean by "difference"? The "difference" between 1970-Jan-01 and 1970-Jan-03 is either "2 days" or "yes" Commented Jan 25, 2019 at 22:50
  • Your queries are very similar and could mabye be optimized in a single, shorter query. It would help if you would provide sample data and expected output... Commented Jan 25, 2019 at 22:54
  • NB : in both queries, the presence of C.category=... in the WHERE clause defeats the LEFT JOIN C (that is, it turns it into an INNER JOIN). Commented Jan 25, 2019 at 22:56

3 Answers 3

1

Your left join of c is actually turned to an inner join because it's used in a NULL excluding expression in the WHERE clause. So you can directly inner join c and left join b. Then you can use a case in one count() to count only the instances where a row from b was joined. Subtract that value from another count() counting all occurrences to get difference.

SELECT a.date,
       count(DISTINCT a.id)
       -
       count(DISTINCT CASE
                        WHEN b.id IS NOT NULL THEN
                          a.id
                      END)
       FROM a
            INNER JOIN c
                       ON c.key = a.key
                          AND c.category = 'cat1'
            LEFT JOIN b
                      ON a.id = b.id
                         AND a.date = b.date
                         AND b.col1 = 'value1'
       WHERE a.col1 = 'value2'
       GROUP BY 1
       ORDER BY 1 DESC;
Sign up to request clarification or add additional context in comments.

Comments

1
SELECT A.date, COUNT(DISTINCT A.id)
FROM A
INNER JOIN B
ON A.id = B.id AND A.date = B.date
AND B.col1 = 'value1'
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2'
AND C.category = 'cat1'
GROUP BY 1
ORDER BY 1 DESC

UNION 

SELECT A.date, COUNT(DISTINCT A.id)
FROM A
LEFT JOIN C on C.key = A.key
WHERE A.col1 = 'value2'
AND C.category = 'cat1'
GROUP BY 1
ORDER BY 1 DESC

Comments

1

A simple way is to JOIN the two queries, using the date column, which is available in both queries :

SELECT x.date, x.cnt, y.cnt, y.cnt - x.cnt
FROM 
(
    SELECT A.date, COUNT(DISTINCT A.id) AS cnt
    FROM A
    INNER JOIN B ON A.id = B.id AND A.date = B.date AND B.col1 = 'value1'
    LEFT JOIN C on C.key = A.key
    WHERE A.col1 = 'value2' AND C.category = 'cat1'
    GROUP BY 1
) AS x 
INNER JOIN (
    SELECT A.date, COUNT(DISTINCT A.id) AS cnt
    FROM A
    LEFT JOIN C on C.key = A.key
    WHERE A.col1 = 'value2' AND C.category = 'cat1'
    GROUP BY 1
) AS y ON x.date = y.date
ORDER BY 1 DESC 

You might want to adapt the join type according to your data layout :

  • LEFT JOIN if all dates are available in the first subquery but may be missing in the second subquery
  • RIGHT JOIN if the situation is the other way around
  • FULL OUTER JOIN if you want all available dates from both ends

If you choose any of the above option, you would need to use COALESCE to prevent the substraction to return NULL when one of the terms is NULL.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.