2

I have a table that is a "tall skinny" fact table:

CREATE TABLE facts(
    eff_date timestamp NOT NULL,
    update_date timestamp NOT NULL,
    symbol_id int4 NOT NULL,
    data_type_id int4 NOT NULL,
    source_id char(3) NOT NULL,
    fact decimal
 /* Keys */
  CONSTRAINT fact_pk
    PRIMARY KEY (source_id, symbol_id, data_type_id, eff_date),
)

I'd like to "pivot" this for a report, so the header looks like this:

eff_date, symbol_id, source_id, datatypeValue1, ... DatatypeValueN

I.e., I'd like a row for each unique combination of eff_date, symbol_id, and source_id.

However, the postgresql crosstab() function only allow on key column.

Any ideas?

1
  • 1
    ... sample data? Commented Apr 9, 2017 at 22:46

1 Answer 1

10

crosstab() expects the following columns from its input query (1st parameter), in this order:

  1. a row_name
  2. (optional) extra columns
  3. a category (matching values in 2nd crosstab parameter)
  4. a value

You don't have a row_name. Add a surrogate row_name with the window function dense_rank().

Your question leaves room for interpretation. Let's add sample rows for demonstration:

INSERT INTO facts (eff_date, update_date, symbol_id, data_type_id, source_id)
VALUES
   (now(), now(), 1,  5, 'foo')
 , (now(), now(), 1,  6, 'foo')
 , (now(), now(), 1,  7, 'foo')
 , (now(), now(), 1,  6, 'bar')
 , (now(), now(), 1,  7, 'bar')
 , (now(), now(), 1, 23, 'bar')
 , (now(), now(), 1,  5, 'baz')
 , (now(), now(), 1, 23, 'baz');  -- only two rows for 'baz'

Interpretation #1: first N values

You want to list the first N values of data_type_id (the smallest, if there are more) for each distinct (source_id, symbol_id, eff_date).

For this, you also need a synthetic category, can be synthesized with row_number(). The basic query to produce input to crosstab():

SELECT dense_rank() OVER (ORDER BY eff_date, symbol_id, source_id)::int AS row_name
     , eff_date, symbol_id, source_id                                   -- extra columns
     , row_number() OVER (PARTITION BY eff_date, symbol_id, source_id
                          ORDER BY data_type_id)::int                   AS category
     , data_type_id                                                     AS value  
FROM   facts
ORDER  BY row_name, category;

Crosstab query:

SELECT *
FROM   crosstab(
  'SELECT dense_rank() OVER (ORDER BY eff_date, symbol_id, source_id)::int AS row_name
        , eff_date, symbol_id, source_id                                   -- extra columns
        , row_number() OVER (PARTITION BY eff_date, symbol_id, source_id
                             ORDER BY data_type_id)::int                   AS category
        , data_type_id                                                     AS value  
   FROM   facts
   ORDER  BY row_name, category'
, 'VALUES (1), (2), (3)'
   ) AS (row_name int, eff_date timestamp, symbol_id int, source_id char(3)
       , datatype_1 int, datatype_2 int, datatype_3 int);

Results:

row_name | eff_date       | symbol_id | source_id | datatype_1 | datatype_2 | datatype_3
-------: | :--------------| --------: | :-------- | ---------: | ---------: | ---------:
       1 | 2017-04-10 ... |         1 | bar       |          6 |          7 |         23
       2 | 2017-04-10 ... |         1 | baz       |          5 |         23 |       null
       3 | 2017-04-10 ... |         1 | foo       |          5 |          6 |          7

Interpretation #2: actual values in column names

You want to append actual values of data_type_id to the column names datatypeValue1, ... DatatypeValueN. One ore more of these:

SELECT DISTINCT data_type_id FROM facts ORDER BY 1;

5, 6, 7, 23 in the example. Then actual display values can be just boolean (or the redundant value?). Basic query:

SELECT dense_rank() OVER (ORDER BY eff_date, symbol_id, source_id)::int AS row_name
     , eff_date, symbol_id, source_id                                   -- extra columns
     , data_type_id                                                     AS category
     , TRUE                                                             AS value
FROM   facts
ORDER  BY row_name, category;

Crosstab query:

SELECT *
FROM   crosstab(
  'SELECT dense_rank() OVER (ORDER BY eff_date, symbol_id, source_id)::int AS row_name
        , eff_date, symbol_id, source_id                                   -- extra columns
        , data_type_id                                                     AS category
        , TRUE                                                             AS value
   FROM   facts
   ORDER  BY row_name, category'
, 'VALUES (5), (6), (7), (23)'  -- actual values
   ) AS (row_name int, eff_date timestamp, symbol_id int, source_id char(3)
       , datatype_5 bool, datatype_6 bool, datatype_7 bool, datatype_23 bool);

Result:

eff_date       | symbol_id | source_id | datatype_5 | datatype_6 | datatype_7 | datatype_23
:--------------| --------: | :-------- | :--------- | :--------- | :--------- | :----------
2017-04-10 ... |         1 | bar       | null       | t          | t          | t          
2017-04-10 ... |         1 | baz       | t          | null       | null       | t          
2017-04-10 ... |         1 | foo       | t          | t          | t          | null       

dbfiddle here

Related:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.