1

I have a simple table (loaded from a view on top of files) with Countries and Zip codes in BigQuery.
For some strange reason when both Country and Zip code are empty ... concat returns null instead of an empty string... to pass the "is null" test. :(

select * 
from `TEST_business`.`zip` as z
where concat(country, zip) is null

Returns one line as shown in below image: One result where concat() is null

BUT if I run it with a separator

select * 
from `TEST_business`.`zip` as z
where concat(country, '-', zip) is null

it returns no lines, as no country or zip are effectively null, as below image shows: concat() with separator returns no rows

Lastly, let's be sure there are no NULL in any of the two columns

select * 
from `TEST_business`.`zip` as z
where country is null or   zip is null

Image shows that BQ itself could not find any row with contry or zip equal to null:
Check for null country or zip returns no rows

So I am baffled at why this could happen, it looks like a bug to me... or I miss the logic of some strange empty string to null conversion from BQ.

To complement the info: I know that the data coming into the zip table sometimes has null values for country, so country is coalesced into an empty string when loading the zip table (that currently recreated at every run)... and this behaviour has not happened before despite being long time that we run this same test after every load... and we have the same rows that have a null country that get coalesced into an empty string since months.

4
  • i was not able to reproduce your issue . anyway - just in case - check this post - Any CONCAT() variation that tolerates NULL values? Commented Apr 9, 2021 at 19:57
  • @MikhailBerlyant I think the behaviour was coming from Files having null and being passed through a view that coalesce them to an empty string on null, that is then saved into the table, but somehow the coalesce is lost in the concat. I was unable to reproduce from straight SQL values entered into a query, only from that table coming from the view coming from files; that's why I took screenshots. – It was also strange that adding a literal string like '-' changes the output from null, to non null... Commented Apr 30, 2021 at 11:13
  • I was neither able to reproduce it. For example with dt as (select '' as country, '' as zip) select * from dt where concat(country, zip) ="" is returning the row while with dt as (select '' as country, '' as zip) select * from dt where concat(country, zip) is null has no results. Yes, maybe you're right, it can be related to how the file is being parsed Commented Oct 11, 2021 at 11:51
  • @issuela I was not able to reproduce the issue myself with "fabricated data". That is why I was baffled and why I took so many screenshots. I think it was something related with the specific file. Commented Nov 24, 2021 at 18:09

1 Answer 1

0

Coming back at this after a few more year of experience with cloud databases my conviction is that it all has to do with how the SQL query is transformed and re-arranged by the SQL optimizer.

As the SQL specification DOES NOT impose an order of execution of predicates or other expressions when reading through a view it can happen, like it is probably the case here, that the optimized query executes the filters and expressions in a different order than we would think about.

In this case it might be that the filter is pushed to the original table/file, where the column is actually null, and only after retrieving the row, the expression that coalesces the null to an empty string is applied. This is why in the JSON view of the result both columns correctly have the "" value.

The final output, column by column, is correct ("" in place of null), but the filtering of the rows to return is not in line with our expectations. But as there is no specification on order of filters, technically it is not wrong.

These kind of things can happen more frequently around the borders, when null and case conversion are concerned.

What I have found to be a general solution is to materialize as table. This totally removes the issue.

In Snowflake, you can also define the view as a Secure view. The Secure view privileges security more than performance and does not push down predicates, following our expectations at the cost of a performance hit (usually small).

Last, but with variable effects, is to rewrite the query in a way that changes what the optimizer does, like it might be the case when we add the '-' in the concat.

Good luck with your queries!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.