0

I have an query which gives me the wrong result.

Tables:

A
+----+
| id |
+----+
|  1 |
|  2 |
+----+

B
+----+----+
| id |  x |  B.id = A.id
+----+----+
|  1 |  1 |
|  1 |  1 |
|  1 |  0 |
+----+----+

C
+----+----+
| id |  y |  C.id = A.id
+----+----+
|  1 |  1 |
|  1 |  2 |
+----+----+

What I want to do: Select all rows from A. For each row in A count in B all x with value 1 and all x with value 0 with B.id = A.id. For each row in A get the minimum y from C with C.id = A.id.

The result I am expecting is:

+----+------+--------+---------+
| id |  min | count1 | count 2 |
+----+------+--------+---------+
|  1 |    1 |      2 |       1 |
|  2 | NULL |      0 |       0 |
+----+------+--------+---------+

First Try: This doesn't work.

SELECT a.id,
       MIN(c.y),
       SUM(IF(b.x = 1, 1, 0)),
       SUM(IF(b.x = 0, 1, 0))
FROM   a
       LEFT JOIN b
              ON ( a.id = b.id )
       LEFT JOIN c
              ON ( a.id = c.id )
GROUP BY a.id

+----+------+--------+---------+
| id |  min | count1 | count 2 |
+----+------+--------+---------+
|  1 |    1 |      4 |       2 |
|  2 | NULL |      0 |       0 |
+----+------+--------+---------+

Second Try: This works but I am sure it has a bad performance.

SELECT a.id,
       MIN(c.y),
       b.x,
       b.y
FROM   a
       LEFT JOIN (SELECT b.id, SUM(IF(b.x = 1, 1, 0)) x, SUM(IF(b.x = 0, 1, 0)) y FROM b) b
              ON ( a.id = b.id )
       LEFT JOIN c
              ON ( a.id = c.id )
GROUP BY a.id

+----+------+--------+---------+
| id |  min | count1 | count 2 |
+----+------+--------+---------+
|  1 |    1 |      2 |       1 |
|  2 | NULL |      0 |       0 |
+----+------+--------+---------+

Last Try: This works too.

SELECT x.*,
       SUM(IF(b.x = 1, 1, 0)),
       SUM(IF(b.x = 0, 1, 0))
FROM   (SELECT a.id,
               MIN(c.y)
        FROM   a
               LEFT JOIN c
                      ON ( a.id = c.id )
        GROUP  BY a.id) x
       LEFT JOIN b
              ON ( b.id = x.id )
GROUP  BY x.id

Now my question is: Is the last one the best choise or is there a way to write this query with just one select statement (like in the first try)?

2
  • what RBDMS? MySQL, PostgreSQL? Commented Aug 15, 2013 at 14:49
  • I use MySQL as database Commented Aug 15, 2013 at 16:05

1 Answer 1

3

Your joins are doing cartesian products for a given value, because there are multiple rows in each table.

You can fix this by using count(distinct) rather than sum():

SELECT a.id, MIN(c.y),
       count(distinct (case when b.x = 1 then b.id end)),
       count(distinct (case when b.x = 0 then b.id end))
FROM   a
       LEFT JOIN b
              ON ( a.id = b.id )
       LEFT JOIN c
              ON ( a.id = c.id )
GROUP BY a.id;

You can also fix this by pre-aggregating b (and/or c). And you would need to take that approach if your aggregation function were something like the sum of a column in b.

EDIT:

You are correct. The above query counts the distinct values of B, but B contains rows that are exact duplicates. (Personally, I think having a column with the name id that has duplicates is a sign of poor design, but that is another issue.)

You could solve it by having a real id in the b table, because then the count(distinct) would count the correct values. You can also solve it by aggregating the two tables before joining them in:

SELECT a.id, c.y, x1, x0
FROM   a
       LEFT JOIN (select b.id,
                         sum(b.x = 1) as x1,
                         sum(b.x = 0) as x0
                  from b
                  group by b.id
                 ) b
              ON ( a.id = b.id )
       LEFT JOIN (select c.id, min(c.y) as y
                  from c
                  group by c.id
                 ) c
              ON ( a.id = c.id );

Here is a SQL Fiddle for the problem.

EDIT II:

You can get it in one statement, but I'm not so sure that it would work on similar data. The idea is that you can count all the cases where x = 1 and then divide by the number of rows in the C table to get the real distinct count:

SELECT a.id, MIN(c.y), 
       coalesce(sum(b.x = 1), 0) / count(distinct coalesce(c.y, -1)), 
       coalesce(sum(b.x = 0), 0) / count(distinct coalesce(c.y, -1))
FROM   a
       LEFT JOIN b
              ON ( a.id = b.id )
       LEFT JOIN c
              ON ( a.id = c.id )
GROUP BY a.id;

It is a little tricky, because you have to handle NULLs to get the right values. Note that this is counting the y value to get a distinct count from the C table. Your question re-enforces why it is a good idea to have a unique integer primary key in every table.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, but this query doesn't give me the correct result. It returns 1, 1, 1, 1 instead of 1, 1, 2, 1.
Thank you again. Your second edit is like my second try and this will not have a good performance if there are lots of entries in B because of many unneeded sums. And your last edit works but it is a hack I don't want to use just to get it in one query. I will stay with my last try version.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.