0

I wonder is there a way to shorten below SELECT CASE query, by replacing the repeating arithmetic ((table1.col_a + table2.col_b) / 2) with something like a variable?

SELECT
  CASE
    WHEN ((table1.col_a + table2.col_b) / 2) < 100 THEN 1
    WHEN ((table1.col_a + table2.col_b) / 2) < 200 THEN 2
    WHEN ((table1.col_a + table2.col_b) / 2) < 250 THEN 3
    WHEN ((table1.col_a + table2.col_b) / 2) < 300 THEN 4
    WHEN ((table1.col_a + table2.col_b) / 2) < 800 THEN 5
    <... till 20>
  END bucket_range
  COUNT(table1.id) as stats
FROM
  table1 INNER JOIN table2
    ON table1.column_x = table2.column_y
WHERE
  <filter conditions>
GROUP BY 1
ORDER BY bucket_range

The solution has to be single SELECT query (on PostgreSQL 10), not stored procedure or function. It should not impact performance.

I tried the following but they are invalid:

SELECT
  CASE ((table1.col_a + table2.col_b) / 2)
    WHEN < 100 THEN 1
    WHEN < 200 THEN 2

and

SELECT
  CASE ((table1.col_a + table2.col_b) / 2) AS x
    WHEN x < 100 THEN 1
    WHEN x < 200 THEN 2

--- Update note

The comparison evaluation arithmetic < bound_number THEN 1 was just a simplified example. The actual bucket sizes are not consistent, I just updated the question to clarify this. The idea is that the arithmetic expression is repeating across cases.

4
  • I think yo can create a new table with begin_col, end_col, value_col 3 columns and can make a join between these two table and use: where (table1.col_a + table2.col_b) / 2 between begin_col and end_col . Commented Mar 31, 2021 at 13:42
  • 1
    If your formulas are really in form WHEN same_expr < 100*N THEN N, then you could use just pure math without case at all :) Commented Mar 31, 2021 at 13:47
  • You could approach it with a cross apply/lateral join: cross apply (select count(*) + 1 from (values (100), (200), (250), (300), (800)) t(v) where v < (table1.col_a + table2.col_b) / 2) as t2(bucket_range) Not sure it's a huge improvement. Commented Mar 31, 2021 at 17:31
  • Hacks like this used to be common once upon a time: substring(' 1 234 5', table1.col_a + table2.col_b) / 2 / 50, 1) Commented Mar 31, 2021 at 17:55

4 Answers 4

2

If I understand the math correctly, you should be able to just divide by 100 and take the floor:

SELECT
    1 + (((table1.col_a + table2.col_b) / 2) / 100) AS bucket_range,
    COUNT(table1.id) AS stats
FROM table1
INNER JOIN table2
    ON table1.column_x = table2.column_y
WHERE
    <filter conditions>
GROUP BY 1
ORDER BY
    bucket_range;
Sign up to request clarification or add additional context in comments.

Comments

0

You can use a derived table and then refer to the column alias in the outer query:

SELECT CASE
          WHEN col_sum < 100 THEN 1
          WHEN col_sum < 200 THEN 2
          WHEN col_sum < 300 THEN 3
          WHEN col_sum < 400 THEN 4
          WHEN col_sum < 500 THEN 5
          <... till 20>
        END bucket_range,
        COUNT(id) as stats
FROM (
  SELECT (table1.col_a + table2.col_b) / 2 as col_sum, 
         table1.id
  FROM table1 
    JOIN table2 ON table1.column_x = table2.column_y
  WHERE
    <filter conditions>
) d
GROUP BY 1
ORDER BY bucket_range

Unrelated, but: if table1.id is defined as NOT NULL then you can also use count(*) which will be slightly faster than count(id) (if table1.id can contain null values, the two expressions wouldn't be equivalent);


If the buckets are all of the same size, you might want to have a look at the width_bucket() function which achieves a similar thing but without writing a lengthy CASE expression.

Something like:

 width_bucket((table1.col_a + table2.col_b) / 2, 0, 10000, 20)

1 Comment

I'm not good with SQL, but I supposed in this case the subquery does not affect performance, as all it wants is aggregation, is my understanding correct? Output of EXPLAIN for both queries are the same.
0

You can replace the whole CASE expression with:

SELECT CEILING((table1.col_a + table2.col_b) / 2 / 100),
       ..............................

assuming that any of col_a and col_b are real numbers so the division is not integer division.

I they are both integers then divide by 2.0:

SELECT CEILING((table1.col_a + table2.col_b) / 2.0 / 100),
       ..............................

2 Comments

Using CEILING this way doesn't work, because the integer truncation will already have taken place before it gets called. See my answer for one valid way to use this approach.
@TimBiegeleisen I mention the case of integer division in my answer and in this case my 2nd query does work.
0

I will present a different approach all together. Since your interval range values are not constant any simple calculation with them will fail. Rather than having a long list of WHEN conditions, you can generate a set of Integer Ranges. You then Join those ranges with your current join to determine which range your calculated value falls within. The big advantage over the case option being the query is virtually impervious to changes in the in the values - just change the array (make sure values are in order).

The following takes your list of break points as an integer array, unrolls it appends a null value to the start and end, then generates a list of ranges within a CTE. The main query joins these ranges with your existing query, and sorts on the range lower bound

with buckets(bucket_range) as 
   (with list (num ) as 
         (select (null) union all 
          select unnest ('{100,200,250,300,800}'::int[]) union all 
          select (null) 
         )
     select  int4range(num, lead(num) over(), '[)')
       from list
   )  --select * from buckets; 
  select (table1.col_a + table2.col_b) / 2 as col_sum, 
         table1.id
  from table1 
  join table2 on table1.column_x = table2.column_y
  join buckets on ((table1.col_a + table2.col_b) / 2) <@ bucket_range
  where  
    <filter conditions>
    and not (lower(bucket_range) is null and upper(bucket_range) is null) 
order by lower(bucket_range) nulls first;

I have created a demonstration. Since you did not provide input data I didn't try to produce your expected results. I just generated something and modified the selected columns to show the relationships. But it does show the query in operation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.