5

I will need your help with this one. I want to normalize numbers within groups. Say I have this dataset:

A  B  C
-------
0  1  3
1  1  0
1  1  2
1  1  4
1  2  1
1  2  1

I want to group these rows by A and B, then normalize the values of C within its group, i.e. summing up all C's of the group and then dividing each C by that sum. In the above example, I would expect this result:

A  B   C
---------
0  1   1            // sum of C's in group is 1, 1/1=1
1  1   0            // sum of C's in group is 6, 0/6=0
1  1  1/3           // ...
1  1  2/3           // ...
1  2  1/2           // sum of C's in group is 2, 1/2=1/2
1  2  1/2           // ...

Division by zero can be handled separately. How to do this using SQL (or PSQL, if that helps)? I can think of ways to do this in principle, but I always end up with deeply nested SELECTs, which I want to avoid.

Thanks in advance!

3 Answers 3

10

You could use windowed functions for it:

SELECT a,b,
    1.0 * c / CASE WHEN SUM(c) OVER(PARTITION BY a,b) = 0 THEN 1
                   ELSE SUM(c) OVER(PARTITION BY a,b) END AS c
FROM tab

SqlFiddleDemo

Output:

╔════╦════╦════════════════════╗
║ a  ║ b  ║         c          ║
╠════╬════╬════════════════════╣
║ 0  ║ 1  ║ 1                  ║
║ 1  ║ 1  ║ 0                  ║
║ 1  ║ 1  ║ 0.3333333333333333 ║
║ 1  ║ 1  ║ 0.6666666666666666 ║
║ 1  ║ 2  ║ 0.5                ║
║ 1  ║ 2  ║ 0.5                ║
║ 2  ║ 2  ║ 0                  ║           -- added for check division by 0
╚════╩════╩════════════════════╝
Sign up to request clarification or add additional context in comments.

Comments

2

You can use a derived table to aggregate the values and then join the results back to the original table. To avoid division by 0, a where clause has been included. For these 0 sum cases, you might have to include a special condition to select them.

select t.a, t.b, 1.0 * t.c/t1.total_c
from tablename t 
join (select a, b, sum(c) as total_c
      from tablename
      group by a, b) t1
on t.a = t1.a and t.b = t1.b
where t1.total_c > 0

2 Comments

Lets test that... hang on :)
I've upgraded it a bit t.total_c should be t1.total_c and Results are 0 for all rows(int division) demo :)
0
with t (a,b,c) as (values
    (0,  1,  3),
    (1,  1,  0),
    (1,  1,  2),
    (1,  1,  4),
    (1,  2,  1),
    (1,  2,  1),
    (2,  2,  0)
)
select
    a, b,
    c::float / coalesce(nullif(sum(c) over(partition by a, b), 0), 1) as normal
from t
;
 a | b |      normal       
---+---+-------------------
 0 | 1 |                 1
 1 | 1 |                 0
 1 | 1 | 0.333333333333333
 1 | 1 | 0.666666666666667
 1 | 2 |               0.5
 1 | 2 |               0.5
 2 | 2 |                 0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.