How to simplify repeating arithmetic in SQL SELECT CASE query

Question

I wonder is there a way to shorten below SELECT CASE query, by replacing the repeating arithmetic ((table1.col_a + table2.col_b) / 2) with something like a variable?

SELECT
  CASE
    WHEN ((table1.col_a + table2.col_b) / 2) < 100 THEN 1
    WHEN ((table1.col_a + table2.col_b) / 2) < 200 THEN 2
    WHEN ((table1.col_a + table2.col_b) / 2) < 250 THEN 3
    WHEN ((table1.col_a + table2.col_b) / 2) < 300 THEN 4
    WHEN ((table1.col_a + table2.col_b) / 2) < 800 THEN 5
    <... till 20>
  END bucket_range
  COUNT(table1.id) as stats
FROM
  table1 INNER JOIN table2
    ON table1.column_x = table2.column_y
WHERE
  <filter conditions>
GROUP BY 1
ORDER BY bucket_range

The solution has to be single SELECT query (on PostgreSQL 10), not stored procedure or function. It should not impact performance.

I tried the following but they are invalid:

SELECT
  CASE ((table1.col_a + table2.col_b) / 2)
    WHEN < 100 THEN 1
    WHEN < 200 THEN 2

and

SELECT
  CASE ((table1.col_a + table2.col_b) / 2) AS x
    WHEN x < 100 THEN 1
    WHEN x < 200 THEN 2

--- Update note

The comparison evaluation arithmetic < bound_number THEN 1 was just a simplified example. The actual bucket sizes are not consistent, I just updated the question to clarify this. The idea is that the arithmetic expression is repeating across cases.

I think yo can create a new table with begin_col, end_col, value_col 3 columns and can make a join between these two table and use: where (table1.col_a + table2.col_b) / 2 between begin_col and end_col . — Rahid Zeynalov
– Rahid Zeynalov, Commented Mar 31, 2021 at 13:42
If your formulas are really in form WHEN same_expr < 100*N THEN N, then you could use just pure math without case at all :) — Arvo
– Arvo, Commented Mar 31, 2021 at 13:47
You could approach it with a cross apply/lateral join: cross apply (select count(*) + 1 from (values (100), (200), (250), (300), (800)) t(v) where v < (table1.col_a + table2.col_b) / 2) as t2(bucket_range) Not sure it's a huge improvement. — shawnt00
– shawnt00, Commented Mar 31, 2021 at 17:31
Hacks like this used to be common once upon a time: substring(' 1 234 5', table1.col_a + table2.col_b) / 2 / 50, 1) — shawnt00
– shawnt00, Commented Mar 31, 2021 at 17:55

Tim Biegeleisen · Accepted Answer · 2021-03-31 13:46:31Z

2

If I understand the math correctly, you should be able to just divide by 100 and take the floor:

SELECT
    1 + (((table1.col_a + table2.col_b) / 2) / 100) AS bucket_range,
    COUNT(table1.id) AS stats
FROM table1
INNER JOIN table2
    ON table1.column_x = table2.column_y
WHERE
    <filter conditions>
GROUP BY 1
ORDER BY
    bucket_range;

answered Mar 31, 2021 at 13:46

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user330315 · Accepted Answer · 2021-03-31 13:42:00Z

0

You can use a derived table and then refer to the column alias in the outer query:

SELECT CASE
          WHEN col_sum < 100 THEN 1
          WHEN col_sum < 200 THEN 2
          WHEN col_sum < 300 THEN 3
          WHEN col_sum < 400 THEN 4
          WHEN col_sum < 500 THEN 5
          <... till 20>
        END bucket_range,
        COUNT(id) as stats
FROM (
  SELECT (table1.col_a + table2.col_b) / 2 as col_sum, 
         table1.id
  FROM table1 
    JOIN table2 ON table1.column_x = table2.column_y
  WHERE
    <filter conditions>
) d
GROUP BY 1
ORDER BY bucket_range

Unrelated, but: if table1.id is defined as NOT NULL then you can also use count(*) which will be slightly faster than count(id) (if table1.id can contain null values, the two expressions wouldn't be equivalent);

If the buckets are all of the same size, you might want to have a look at the width_bucket() function which achieves a similar thing but without writing a lengthy CASE expression.

Something like:

 width_bucket((table1.col_a + table2.col_b) / 2, 0, 10000, 20)

answered Mar 31, 2021 at 13:42

user330315

1 Comment

shiouming Over a year ago

I'm not good with SQL, but I supposed in this case the subquery does not affect performance, as all it wants is aggregation, is my understanding correct? Output of EXPLAIN for both queries are the same.

forpas · Accepted Answer · 2021-03-31 13:55:30Z

0

You can replace the whole CASE expression with:

SELECT CEILING((table1.col_a + table2.col_b) / 2 / 100),
       ..............................

assuming that any of col_a and col_b are real numbers so the division is not integer division.

I they are both integers then divide by 2.0:

SELECT CEILING((table1.col_a + table2.col_b) / 2.0 / 100),
       ..............................

edited Mar 31, 2021 at 13:55

answered Mar 31, 2021 at 13:49

forpas

165k10 gold badges51 silver badges85 bronze badges

2 Comments

Tim Biegeleisen Over a year ago

Using CEILING this way doesn't work, because the integer truncation will already have taken place before it gets called. See my answer for one valid way to use this approach.

forpas Over a year ago

@TimBiegeleisen I mention the case of integer division in my answer and in this case my 2nd query does work.

Belayer · Accepted Answer · 2021-04-01 18:02:02Z

I will present a different approach all together. Since your interval range values are not constant any simple calculation with them will fail. Rather than having a long list of WHEN conditions, you can generate a set of Integer Ranges. You then Join those ranges with your current join to determine which range your calculated value falls within. The big advantage over the case option being the query is virtually impervious to changes in the in the values - just change the array (make sure values are in order).

The following takes your list of break points as an integer array, unrolls it appends a null value to the start and end, then generates a list of ranges within a CTE. The main query joins these ranges with your existing query, and sorts on the range lower bound

with buckets(bucket_range) as 
   (with list (num ) as 
         (select (null) union all 
          select unnest ('{100,200,250,300,800}'::int[]) union all 
          select (null) 
         )
     select  int4range(num, lead(num) over(), '[)')
       from list
   )  --select * from buckets; 
  select (table1.col_a + table2.col_b) / 2 as col_sum, 
         table1.id
  from table1 
  join table2 on table1.column_x = table2.column_y
  join buckets on ((table1.col_a + table2.col_b) / 2) <@ bucket_range
  where  
    <filter conditions>
    and not (lower(bucket_range) is null and upper(bucket_range) is null) 
order by lower(bucket_range) nulls first;

I have created a demonstration. Since you did not provide input data I didn't try to produce your expected results. I just generated something and modified the selected columns to show the relationships. But it does show the query in operation.

Collectives™ on Stack Overflow

How to simplify repeating arithmetic in SQL SELECT CASE query

4 Answers 4

Comments

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related