I have a PostgreSQL query that aggregates data from multiple tables and generates a report. Currently it works perfectly fine and it is quite simple. Here is the query I'm using:
SELECT
a.name AS account_name,
c.name AS campaign_name,
ad.name AS ad_name,
g.cpm AS cpm,
SUM(g.impressions) AS impressions,
SUM(g.clicks) AS clicks,
SUM(g.conversions) AS conversions,
SUM(g.impressions) * g.cpm AS cost,
SUM(g.revenue) AS order_revenue
FROM google_analytics g
JOIN ad ad ON ad.id = g.ad_id
JOIN campaign c ON c.id = ad.campaign_id
JOIN account a ON a.id = c.account_id
WHERE g.created_at >= ? AND g.created_at <= ?
GROUP BY 1, 2, 3, 4
ORDER BY 2
And my table schema:
CREATE TABLE public.google_analytics (
created_at TIMESTAMP WITHOUT TIME ZONE NULL,
ad_id INTEGER NULL,
impressions INTEGER NULL,
clicks INTEGER NULL,
conversions INTEGER NULL,
ecpm INTEGER NULL,
commission SMALLINT NULL,
revenue INTEGER NULL
);
The current report aggregates data properly with the SUM and GROUP BY clauses, and works as expected. However, the application code extends this report by handling additional calculations and aggregations, such as rollups, averaging certain fields, and building a hierarchical structure.
For example, the application generates a report like this:
| account_name | campaign_name | ad_name | cpm | impressions | clicks | conversions | cost | order_revenue |
|---|---|---|---|---|---|---|---|---|
| Account A | 1.75 CPM | 2200 | 20 | 2 | 1.45 | $150 | ||
| Campaign A | Ad 1 | 1.50 CPM | 1000 | 10 | 1 | 0.50 | $100 | |
| Campaign A | Ad 2 | 2.00 CPM | 1200 | 10 | 1 | 2.40 | $50 |
In this structure:
- Impressions, clicks, and revenue are summed
- CPM and cost are averaged
- The data is grouped hierarchically by account, campaign, and ad
The application code handling these calculations is very large and complex, making it difficult to maintain. I am planning to replace this application logic with a SQL-based solution to simplify maintenance and improve performance.
I have experimented with the ROLLUP and CUBE features in PostgreSQL, and I got excited about the capabilities they offer. These features seem to perform much of the grunt work that the application currently does. My initial results were very promising and I got the very same tree-like structure for summing up all the metrics.
However, I am unsure if PostgreSQL's built-in rollup supports averaging and summing at different levels of the hierarchy. My goal is to use these features to replace the application code and handle the calculations directly within the database.
For instance:
- I was unable to calculate the average instead of the sum: how can the rollup use average for a column, and sum for another column? It seems to always sum things
- I was unable to calculate the sum of cost, since cost is already a calculated field
In order to contextualize even more, and sake of understanding the tree, this is the final JSON I am planning to generate:
[
{
"name": "Account A",
"impressions": 236797,
"cpm": 2.48,
"children": [
{
"name": "Campaign A",
"impressions": 236797,
"cpm": 2.48,
"children": [
{
"name": "Ad 1",
"impressions": 236797,
"cpm": 2.48,
"children": []
}
]
}
]
}
]
CUBEandROLLUP, but didn't provide the code needed to assess your attempts and offer suggestions.GROUP BYandORDER BY. The use of ordinals results in fragile queries (changes to the select list can easily break the query) and imposes a higher cognitive load on the reader than does the use of names.