0

I'm a Postgresql newbie, so still struggling a little bit here. Please be gentle.

I'm left joining three tables, and would like to be able to use a case statement to introduce another column that brings across a desired value from one column based on another. I'm going to guess that my INNER JOIN and CASE statements are back to front, but I'm not sure how to rearrange them without breaking the intent.

Basically: Where model_best_fit == SUNNY, then I'd like a new column with name applied_f_model_hours_above4k to have the value from the column hoursabove4k_sunny

Code sample:

SELECT *
    FROM px_fuel_weathercell
        INNER JOIN f_descriptions ON px_f_weathercell.px_id = f_descriptions.fuel_id
        INNER JOIN dailywx ON px_f_weathercell.fid_new_wx_cells = dailywx.location
        CASE best_model_fit
            WHEN 'SUNNY' then hoursabove4k_sunny
        END applied_f_model_hours_above4k
    WHERE best_model_fit = 'SUNNY' /* limiting my test case here, clause will be removed later */
LIMIT 1000; 

The error is as follows:

ERROR:  syntax error at or near "CASE"
LINE 5:   CASE best_model_fit
          ^
SQL state: 42601
Character: 210

Thank you for any help you can offer.

Bonus points: CASE seems slow. 45 seconds to run this query. dailywx has 400,000 rows, px_f_weathercell has 6,000,000 rows. Is there a faster way to do this?

EDIT: Made the following edit, no getting a column full of nulls when the column desired has numbers (including 0) in it. Both columns are of type double.

EDIT2: Updated a couple of table names to indicate where columns are coming from. Also updated to show left join. I've also used PGTune to set some recommended settings in order to address a situation where the process was disk bound. I've also set an index on px_f_weathercell.fid_new_wx_cells and px_f_weathercell.px_id. This has resulted in 100,000 records returning in approx 5-7 seconds. I'm still recieving null values from the CASE statement, however.

SELECT *,
    CASE best_model_fit
        WHEN 'SUNNY' then dailywx.hoursabove4k_sunny
    END applied_f_model_hours_above4k
    FROM px_fuel_weathercell
        LEFT JOIN f_descriptions ON px_f_weathercell.px_id = f_descriptions.fuel_id
        LEFT JOIN dailywx ON px_f_weathercell.fid_new_wx_cells = dailywx.location
    WHERE fuel_descriptions.best_model_fit = 'SUNNY' /* limiting my test case here, clause will be removed later */
LIMIT 1000; 
6
  • Move the case expression to your select (that's where the number of columns is determined) Commented Apr 15, 2021 at 2:23
  • I've tried what I think you mean, but am not getting a column full of nulls. This is a step forward, at least. Edit above. Commented Apr 15, 2021 at 2:31
  • You're also inner joining three tables, not left joining. If you don't put some type of filter on your query, you're going to be hitting the cartesian product of most (all) of your records, which means you likely have full table scans going on (slow, doesn't really use indexes because there's no reason to). And can you edit your question to show where some of these columns are coming from? E.g., we can't tell where best_model_fit or hoursabove4k_sunny come from. Commented Apr 15, 2021 at 3:39
  • @ps2goat, thanks for the pointer on cartesian queries. I have changed to left joinas intended, set an index ad described in the dit above, and addressed memory and caching sizes in response to noticing the process was disk io bound. What I am still struggling with is why the CASE statement is returning null. Commented Apr 15, 2021 at 5:50
  • Normally you specify an else statement. CASE best_model_fit WHEN 'SUNNY' then dailywx.hoursabove4k_sunny ELSE -1 END applied_f_model_hours_above4k -- use whatever value you want as the default in the else clause. If all the values for that are null, you probably don't have the rest of the query correct or the data isn't set up how you think it is (e.g., it doesn't match your conditions.) stackoverflow.com/q/40101963/2084315 Commented Apr 15, 2021 at 6:56

1 Answer 1

1

In a table, all rows have the same columns. You cannot have a column that exists only for some rows. Since a query result is essentially a table, that applies there as well.

So having NULL or 0 as a result for the rows where the information does not apply is your only choice.

The reason why the CASE expression is returning NULL is that you have no ELSE branch. If none of the WHEN conditions applies, the result is NULL.

The performance of the query is a different thing. You'd need to supply EXPLAIN (ANALYZE, BUFFERS) output to analyze that. But when joining big tables, it is often beneficial to set work_mem high enough.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Laurenz Albe. I've changed to a left join on both statements now, and set my memory / caching values quite high for this process (26GB), and have noticed a significant speedup in queries. I'm still stuck, however, on why the case statement is returning nulls.
I have added an explanation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.