7

I'm trying to refine a SQL query to make my reports looks better. My query reads data from one table, groups by a few colums and calculates some aggregate fields (counts and sums).

SELECT A, B, C, COUNT(*), SUM(D) FROM T
GROUP BY A, B, C
ORDER BY A, B, C

Now, let's assume B and C columns are some defined constant strings, for example, B can be 'B1' or 'B2', C can be 'C1' or 'C2'. So, an example resultset is:

A  | B  | C  | COUNT(*) | SUM(D)
--------------------------------
A1 | B1 | C1 |       34 |   1752
A1 | B1 | C2 |        4 |    183
A1 | B2 | C1 |      199 |   8926
A1 | B2 | C2 |       56 |   2511
A2 | B1 | C2 |        6 |     89
A2 | B2 | C2 |       12 |    231
A3 | B1 | C1 |       89 |    552
...

As you can see, for 'A1' I have all four possible (B, C) combination, but that's not true for 'A2'. My question is: how can I generate also summary rows for (B, C) combination not present, in fact, in the given table? That is, how can I print, for example, also these rows:

A  | B  | C  | COUNT(*) | SUM(D)
--------------------------------
A2 | B1 | C1 |        0 |      0
A2 | B2 | C1 |        0 |      0

The only solution I can see is to create some auxiliarity tables with all (B, C) values and then make a RIGHT OUTER JOIN with that aux table. But I'm searching for a cleaner way...

Thank you all.

7
  • What are you using mssql, oracel, mysql? Commented May 10, 2012 at 7:43
  • Oracle. But the query is written, at the moment, in a standard way (it runs on MySQL too, we have a MySQL copy of the DB for no question-relevant reason). Commented May 10, 2012 at 7:44
  • Maybe this helps. Grouping sets are available in both SQL Server and Oracle. Not sure about MySQL. Commented May 10, 2012 at 7:50
  • @dradu Very interesting. I've never hear of CUBE and GROUPING SETS before. Maybe bacuse I'm mostly a MySQL user. Anyway, it does not help: both CUBE and GROUPING SETS produce any kind of grand-total, but not generate empty summaries. Commented May 10, 2012 at 8:04
  • 1
    What is "unclean" about using a right outer join to solve the problem? That seems like the right solution. Commented May 10, 2012 at 14:52

3 Answers 3

2

The auxiliary table doesn't have to be a real table, it can be a common table expression - at least if you can get all possible values (or all you're interested in) from the table itself. Using @Bob Jarvis' query to generate all possible combinations you can do something like:

WITH CTE AS (
    SELECT * FROM (SELECT DISTINCT a FROM T)
    JOIN (SELECT DISTINCT b, c FROM T) ON (1 = 1)
)
SELECT CTE.A, CTE.B, CTE.C,
    SUM(CASE WHEN T.A IS NULL THEN 0 ELSE 1 END), NVL(SUM(T.D),0)
FROM CTE
LEFT JOIN T ON T.A = CTE.A AND T.B = CTE.B AND T.C = CTE.C
GROUP BY CTE.A, CTE.B, CTE.C
ORDER BY CTE.A, CTE.B, CTE.C;

If you have fixed values that may not be in the table then it's a little more complicated (or uglier anyway, and gets worse with more possible values):

WITH CTE AS (
    SELECT * FROM (SELECT DISTINCT a FROM T)
    JOIN (SELECT 'B1' AS B FROM DUAL
        UNION ALL SELECT 'B2' FROM DUAL) ON (1 = 1)
    JOIN (SELECT 'C1' AS C FROM DUAL
        UNION ALL SELECT 'C2' FROM DUAL) ON (1 = 1)
)
SELECT CTE.A, CTE.B, CTE.C,
    SUM(CASE WHEN T.A IS NULL THEN 0 ELSE 1 END), NVL(SUM(T.D),0)
FROM CTE
LEFT JOIN T ON T.A = CTE.A AND T.B = CTE.B AND T.C = CTE.C
GROUP BY CTE.A, CTE.B, CTE.C
ORDER BY CTE.A, CTE.B, CTE.C;

But you have to join to something that knows about the 'missing' values. If the same logic is needed elsewhere, and you have fixed values, then a permanent table might be cleaner - maintenance may be needed either way of course. You could also consider a pipelined function to act as a surrogate table, but depends on volumes maybe.

Sign up to request clarification or add additional context in comments.

1 Comment

The table itself contains all the possibilities. Obviously, I was NOT expecting the DB engine will "guess" them. +1 for the interesting usage of WITH, I've never seen t before.
1

The thing is, if you don't have a particular combination in your database, how would an engine know to include that combination into the results? In order to have all combinations in the results, you need to have all combinations available - whether in the main table or in some other table used for referencing. For example, you can create another table R with data like so:

A  | B  | C  
------------
A1 | B1 | C1
A1 | B1 | C2
A1 | B2 | C1
A1 | B2 | C2
A2 | B1 | C1
A2 | B1 | C2
A2 | B2 | C1
A2 | B2 | C2
A3 | B1 | C1
A3 | B1 | C2
A3 | B1 | C1
A3 | B2 | C2
...

And then your query would look like this:

SELECT r.*, COUNT(t.d), coalesce(SUM(t.d), 0)
FROM r LEFT OUTER JOIN t on (r.a=t.a and r.b=t.b and r.c=t.c)
GROUP BY r.a, r.b, r.c
ORDER BY r.a, r.b, r.c

This will return you the set as you want with 0 | 0 for combination that don't exist in the main table. Note that this is only possible if you do know every possible combination you want to include, which may not always be the case.

If on the other hand your A, B, C are numerical values and you just want to include all numbers in a range, then there may be another way of dealing with this, something like this:

SELECT a.n, b.n, c.n, COUNT(t.d), coalesce(SUM(t.d), 0)
FROM (SELECT (rownum) "n" FROM DUAL WHERE LEVEL >= start_a CONNECT BY LEVEL <= end_a) a,
     (SELECT (rownum) "n" FROM DUAL WHERE LEVEL >= start_b CONNECT BY LEVEL <= end_b) b,
     (SELECT (rownum) "n" FROM DUAL WHERE LEVEL >= start_c CONNECT BY LEVEL <= end_c) c,
     t
WHERE a.n = t.a(+) AND b.n = t.b(+) AND c.n = t.c(+)
GROUP BY a.n, b.n, c.n
ORDER BY a.n, b.n, c.n

(I don't have an Oracle instance handy to test this, so this is more of a somewhat educated guess rather than anything else.)

The bottom line is the engine needs to know what to include into the final results - one way or another.

3 Comments

That's how I'm doing it right now. See the last paragraph in my question.
@lorenzo-s The main problem is that without you specifying all valid combination, the engine has no way of knowing, therefore you do need to have this temporary table.
The table itself (considering all the rows) contains all combinations of (B, C) values.
0

There's probably prettier ways to do this, but the following should get you started towards what you want:

SELECT * FROM 
  (SELECT DISTINCT a FROM T)
JOIN
  (SELECT DISTINCT b, c FROM T)
  ON (1 = 1)
ORDER BY a, b, c

This will give you all combinations which exist of B and C, together with all A's which exist, similar to

A1  B1  C1
A1  B1  C2
A1  B2  C1
A1  B2  C2
A2  B1  C1
A2  B1  C2
A2  B2  C1
A2  B2  C2

Share and enjoy.

3 Comments

I'm searching for something (and I'm not assuming it exixts!) that does not involve a join or so... I'm currently using a table like the one you wrote to achieve the result. See the last paragraph in my question.
The issue here is different. If table T contains all possible combinations, then you're ok, however it it doesn't, then you're back to square one. Somehow the DB engine needs to know what all the possible combinations are.
@AleksG Ok, ok, we do not have this issue: the table itself (considering all the rows) contains all combinations of (B, C) values. Obviously, I was NOT expecting the DB engine will "guess" them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.