2

While I was preparing an answer to one of our fellows here on SO I've encounter an odd situation, at least to me. The original question is here: Pivot Table Omitting Rows that Have Null values

I've modified the query to use max instead of group_concat in order to show the "problem" in all databases.

SELECT 
  id, 
  max(case when colID = 1 then value else '' end) AS fn,
  max(case when colID = 2 then value else '' end) AS ln,
  max(case when colID = 3 then value else '' end) AS jt
FROM tbl 
GROUP BY id

The result of this query is this:

ID    FN        LN            JT
1    Sampo    Kallinen     Office Manager
2    Jakko    Salovaara    Vice President
3    (null)   Foo          No First Name

The user asks to filter the row with id 3 because the field value is null.

When it seems pretty obvious that only it needs to do was to add a WHERE value IS NOT NULL constraint on that query to achieve what the user expect. It won't work.

So I start to test it on the other databases to see what happens (Queries with the WHERE CLAUSE)

SELECT 
  id, 
  max(case when colID = 1 then value else '' end) AS fn,
  max(case when colID = 2 then value else '' end) AS ln,
  max(case when colID = 3 then value else '' end) AS jt
FROM tbl 
  WHERE value is not null
GROUP BY id

For my surprise the result was the same, none worked.

Then I tried a different version of the same query:

SELECT * FROM (
    SELECT 
      id, 
      max(case when colID = 1 then value else '' end) AS fn,
      max(case when colID = 2 then value else '' end) AS ln,
      max(case when colID = 3 then value else '' end) AS jt
    FROM tbl 
    GROUP BY id
) T
WHERE fn IS NOT NULL
  AND ln IS NOT NULL
  AND jt IS NOT NULL

The only way I could make it work on all databases was with this query:

SELECT 
  id, 
  max(case when colID = 1 then value else '' end) AS fn,
  max(case when colID = 2 then value else '' end) AS ln,
  max(case when colID = 3 then value else '' end) AS jt
FROM tbl 
WHERE NOT EXISTS (SELECT * FROM tbl b WHERE tbl.id=b.id AND value IS NULL)
GROUP BY id

So I ask:
What is happening here that except for that specific case on Oracle all other DBs seem to ignore the IS NOT NULL filter?

4
  • I'm guessing that in the first examples even though you exclude the item with null you don't exclude all items for that row/group, and the db shows the missing value as null. Btw, if you want to exclude the entire group for a row with a missing value this would work too; sqlfiddle.com/#!6/78395/10 Commented Oct 22, 2014 at 22:51
  • There is no such thing as "Postgre". (Like there is also no RDBMS named "My".) Please fix your question. Commented Oct 22, 2014 at 23:27
  • 1
    BTW, your "different version" with the subquery is a valid approach. It only fails because you replaced the NULL in the original question with '', which is NOT NULL - except for Oracle, which has a dubious implementation like @Joshua explains. Commented Oct 23, 2014 at 1:11
  • Downvoter cares to explain why this question was downvoted? Commented Oct 17, 2017 at 11:51

3 Answers 3

4

To omit the result row if any source row for the same id has value IS NULL, we could use the aggregate function every() in the HAVING clause in Postgres. Or bool_and() (synonym for historical reasons).

SELECT id
     , max(CASE WHEN colID = 1 THEN value ELSE '' END) AS fn
     , max(CASE WHEN colID = 2 THEN value ELSE '' END) AS ln
     , max(CASE WHEN colID = 3 THEN value ELSE '' END) AS jt
FROM   tbl 
GROUP  BY id
HAVING every(value IS NOT NULL);

Better, yet, with the aggregate FILTER clause (Postgres 9.4+). See:

SELECT id
     , max(value) FILTER (WHERE colID = 1) AS fn
     , max(value) FILTER (WHERE colID = 2) AS ln
     , max(value) FILTER (WHERE colID = 3) AS jt
FROM   tbl 
GROUP  BY id
HAVING every(value IS NOT NULL);

If you insist on the empty string '' as default, wrap it in COALESCE().

fiddle
Old sqlfiddle

A faster solution in Postgres would be to use crosstab(). Details:

Explanation

Your attempt with a WHERE clause would just eliminate one source row for id = 3 in your example (the one with colID = 1), leaving two more for the same id. So we still get a row for id = 3 in the result after aggregating.

But since we have no row with colID = 1, we get an empty string (note: not a NULL value!) for fn in the result for id = 3.

Other RDBMS

While EVERY is defined in the SQL:2008 standard, many RDBMS do not support it, presumably because some of them have shady implementations of the boolean type. (Not dropping any names like "MySQL" or "Oracle" ...). You can substitute everywhere (including Postgres) with:

SELECT id
     , max(CASE WHEN colID = 1 then value else '' end) AS fn
     , max(CASE WHEN colID = 2 then value else '' end) AS ln
     , max(CASE WHEN colID = 3 then value else '' end) AS jt
FROM   tbl 
GROUP  BY id
HAVING count(*) = count(value);

Because count() doesn't count null values. In MySQL there is also bit_and(). See:

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @ErwinBrandstetter I've also though of this case that I was eliminating only one case so I did this test ` FROM (select * from tbl where value is not null) t` which eliminates (or it should) the result from the query that will be grouped and still have the same result: sqlfiddle.com/#!15/78395/4 Your answer is perfect and I will accept it, I only ask you to kindly answer this last doubt please.
@JorgeCampos: Your new example still does the same: just eliminate one of three source rows for id = 3. Thus the same result.
Ooooooh of course... I was missing the fact that the id here is being grouped not the value. Now I understand. Thank you so much.
2

It works in Oracle because Oracle handles NULL incorrectly in that NULL and '' are the same. The other databases don't do this because it is wrong. NULL is unknown, versus '' which is just a blank, empty string.

So if your where clause said something like WHERE (fn IS NOT NULL or fn <> '') you would probably get further.

3 Comments

You are right, for the subquery, fn IS NOT NULL or fn <> '' worked. But I'm seeking more an explanation WHY it behaves like this and not an solution +1 anyway :)
I thought I did explain in the first two sentences but thanks for the upvote!
In the case at hand just WHERE fn <> '' would work in all versions (for the query with the subselect). Either way, your answer explains why that query works in Oracle when it shouldn't.
0

I think this is a case where a HAVING clause will do what you need.

SELECT id, max ... (same stuff as before)
FROM tbl
GROUP by id
HAVING  fn IS NOT NULL
    AND ln IS NOT NULL
    AND jt IS NOT NULL

1 Comment

@Jorge: This, too, is confusing output columns with input columns. In the WHERE and HAVING clause you can only reference input columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.