How to get values to dense table with columns of categories using PostgreSQL (crosstab)?

Question

I have this toy example which gives me sparse table of values separated in their different categories. I would want to have dense matrix, where all columns are individually ordered.

drop table if exists temp_table;
create temp table temp_table(
    rowid int
    , category text
    , score int
    );
insert into temp_table values (0, 'cat1', 10);
insert into temp_table values (1, 'cat2', 21);
insert into temp_table values (2, 'cat3', 32);
insert into temp_table values (3, 'cat2', 23);
insert into temp_table values (4, 'cat2', 24);
insert into temp_table values (5, 'cat3', 35);
insert into temp_table values (6, 'cat1', 16);
insert into temp_table values (7, 'cat1', 17);
insert into temp_table values (8, 'cat2', 28);
insert into temp_table values (9, 'cat2', 29);

Which gives this temporary table:

rowid	category	score
0	cat1	10
1	cat2	21
2	cat3	32
3	cat2	23
4	cat2	24
5	cat3	35
6	cat1	16
7	cat1	17
8	cat2	28
9	cat2	29

Then ordering score values to different columns based on their category:

select "cat1", "cat2", "cat3"
from crosstab(
    $$ select rowid, category, score from temp_table $$ -- as source_sql
    , $$ select distinct category from temp_table order by category $$ -- as category_sql
 ) as (rowid int, "cat1" int, "cat2" int, "cat3" int)

That outputs:

cat1	cat2	cat3
10
	21
		32
	23
	24
		35
16
17
	28
	29

But I would want the result of the query to be dense, like:

cat1	cat2	cat3
10	21	32
16	23	35
17	24
	28
	29

Maybe PostgreSQL's crosstab is not even right tool to do this, but that comes to my mind first as it produces that sparse table close to the result I would need.

What makes 21 in cat2 to be on same row with 32 on cat3 and not some other of the possible values? Is it ordered by rowid or score? — Vesa Karjalainen
– Vesa Karjalainen, Commented May 19, 2022 at 11:11
In the final wanted result, the "32" is in the first row just because it is the first value in cat3 category. Imagine, that after I've got a dense table of values, I can use LibreOffice calc to easily draw line diagrams of these three categories in the same graph. — zimon
– zimon, Commented May 19, 2022 at 11:14
"It is first" when sorted on which column? Relational databases do not guarantee you get rows in any particular order, unless you sort them. — Vesa Karjalainen
– Vesa Karjalainen, Commented May 19, 2022 at 11:20
In this toy example, I happen to have accidentally values in order, but in the real case they are not ordered. I know how to order the values in the sparse result table already, by adding "order by score" in the crosstab()'s "source_sql" parameter. The problem is, how to get from the sparse result table to the dense result table then. — zimon
– zimon, Commented May 19, 2022 at 11:25
Also, is the number of columns (cat1 .. cat3) fixed, since you have them spelled out in your query or should this be dynamic? — Vesa Karjalainen
– Vesa Karjalainen, Commented May 19, 2022 at 11:26

karthik_ghorpade · Accepted Answer · 2022-05-19 11:57:57Z

This should work for the exact given example data and expected output.

select max(cat1), max(cat2), max(cat3) 
from crosstab(
$$ select rank() over(partition by category order by rowid) as ranking, 
  rowid, 
  category, 
  score 
from temp_table 
order by rowid, category asc$$ -- as source_sql
, $$ select distinct category 
from temp_table 
order by category $$ -- as category_sql
  ) as (ranking int, rowid int, "cat1" int, "cat2" int, "cat3" int) 
group by ranking 
order by ranking asc

You can test the solution here - https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f198e40a18a282cc0d65fa6ecdf797cb

Edit: Improvements made to your query to arrive at the solution:

In the source SQL query, I have ranked the category values based on the rowid order, which helps "determining" the order of the expected values, as per your requirement.

select rank() over(partition by category order by rowid) as ranking, rowid, category, score from temp_table order by rowid, category asc

In the external query, I am effectively picking the max() values of each category, for each of the rankings as obtained in the source SQL query.

Vesa Karjalainen · Accepted Answer · 2022-05-19 12:03:49Z

1

with cte as (
  select category, score, row_number() over (
    partition by category order by score
  ) as r
  from temp_table
)
  select
    sum(score) filter (where category = 'cat1') as cat1,
    sum(score) filter (where category = 'cat2') as cat2,
    sum(score) filter (where category = 'cat3') as cat3
  from cte
  group by r
  order by r
;

If the number of columns is known and it is reasonably small, FILTER might be a better option than CROSSTAB, which requires an extension.

answered May 19, 2022 at 12:03

Vesa Karjalainen

1,1058 silver badges16 bronze badges

1 Comment

zimon Over a year ago

Yes, good point, to able to make it without extension also. I learned a lot from both solutions. They are not really obvious ways to use SQL; both using aggregate methods and partitions.

Collectives™ on Stack Overflow

How to get values to dense table with columns of categories using PostgreSQL (crosstab)?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related