2

A common beef I get when trying to evangelize the benefits of learning freehand SQL to MS Access users is the complexity of creating the effects of a crosstab query in the manner Access does it. I realize that strictly speaking, in SQL it doesn't work that way -- the reason it's possible in Access is because it's handling the rendering the of the data.

Specifically, when I have a table with entities, dates and quantities, it's frequent that we want to see a single entity on one line with the dates represented as columns:

This:

entity       date      qty
------       --------  ---
278700-002   1/1/2016  5
278700-002   2/1/2016  3
278700-002   2/1/2016  8
278700-002   3/1/2016  1
278700-003   2/1/2016  12

Becomes this:

Entity      1/1/16   2/1/16   3/1/16
----------  ------   ------   ------
278700-002    5       11        1
278700-003            12

That said, the common way we've approached this is something similar to this:

with vals as (
  select
    entity,
    case when order_date = '2016-01-01' then qty else 0 end as q16_01,
    case when order_date = '2016-02-01' then qty else 0 end as q16_02,
    case when order_date = '2016-03-01' then qty else 0 end as q16_02
  from mydata
)
select
  entity, sum (q16_01) as q16_01, sum (q16_02) as q16_02, sum (q16_03) as q16_03
from vals
group by entity

This is radically oversimplified, but I believe most people will get my meaning.

The main problem with this is not the limit on the number of columns -- the data is typically bounded, and I can make due with a fixed number of date columns -- 36 months, or whatever, depending on the context of the data. My issue is the fact that I have to change the dates every month to make this work.

I had an idea that I could leverage arrays to dynamically assign the quantity to the index of the array, based on the month away from the current date. In this manner, my data would end up looking like this:

Entity      Values
----------  ------
278700-002  {5,11,1}
278700-003  {0,12,0}

This would be quite acceptable, as I could manage the rendering of the actual columns within whatever rendering tool I was using (Excel, for example).

The problem is I'm stuck... how do I get from my data to this. If this were Perl, I would loop through the data and do something like this:

foreach my $ref (@data) {
  my ($entity, $month_offset, $qty) = @$ref;
  $values{$entity}->[$month_offset] += $qty;
}

By this isn't Perl... so far, this is what I have, and now I'm at a mental impasse.

with offset as (
  select
    entity, order_date, qty,
    (extract (year from order_date ) - 2015) * 12 +
     extract (month from order_date ) - 9 as month_offset,
    array[]::integer[] as values
  from mydata
)
select
  prod_id, playgrd_dte, -- oh my...  how do I load into my array?
from fcst

The "2015" and the "9" are not really hard-coded -- I put them there for simplicity sake for this example.

Also, if my approach or my assumptions are totally off, I trust someone will set me straight.

1 Answer 1

3

As with all things imaginable and unimaginable, there is a way to do this with PostgreSQL. It looks like this:

WITH cte AS (
  WITH minmax AS (
    SELECT min(extract(month from order_date))::int,
           max(extract(month from order_date))::int
    FROM mytable
  )
  SELECT entity, mon, 0 AS qty
  FROM (SELECT DISTINCT entity FROM mytable) entities,
       (SELECT generate_series(min, max) AS mon FROM minmax) allmonths
  UNION
  SELECT entity, extract(month from order_date)::int, qty FROM mytable
)
SELECT entity, array_agg(sum) AS values
FROM (
  SELECT entity, mon, sum(qty) FROM cte
  GROUP BY 1, 2) sub
GROUP BY 1
ORDER BY 1;

A few words of explanation:

The standard way to produce an array inside a SQL statement is to use the array_agg() function. Your problem is that you have months without data and then array_agg() happily produces nothing, leaving you with arrays of unequal length and no information on where in the time period the data comes from. You can solve this by adding 0's for every combination of 'entity' and the months in the period of interest. That is what this snippet of code does:

SELECT entity, mon, 0 AS qty
FROM (SELECT DISTINCT entity FROM mytable) entities,
     (SELECT generate_series(min, max) AS mon FROM minmax) allmonths

All those 0's are UNIONed to the actual data from 'mytable' and then (in the main query) you can first sum up the quantities by entity and month and subsequently aggregate those sums into an array for each entity. Since it is a double aggregation you need the sub-query. (You could also sum the quantities in the UNION but then you would also need a sub-query because UNIONs don't allow aggregation.)

The minmax CTE can be adjusted to include the year as well (your sample data doesn't need it). Do note that the actual min and max values are immaterial to the index in the array: if min is 743 it will still occupy the first position in the array; those values are only used for GROUPing, not indexing.

SQLFiddle

For ease of use you could wrap this query up in a SQL language function with parameters for the starting and ending month. Adjust the minmax CTE to produce appropriate min and max values for the generate_series() call and in the UNION filter the rows from 'mytable' to be considered.

Sign up to request clarification or add additional context in comments.

2 Comments

This is exactly what I was looking for -- thanks for the detailed explanation. One quick question -- in your subquery sub, is the order guaranteed, or would I need to add an order by 1, 2 to ensure that the months were not listed out of order when they were grouped?
In the sub sub-query the order is irrelevant. Grouping is done on the value of mon, no matter the order. In many (but not all!) SQL operations order is of no importance, but given our human preference for ordered things the ORDER BY is included in the main query. You can verify this by adding some data out of order in the SQLfiddle: the answer will always be the same.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.