Efficient query to get last row group by multiple columns

Question

I have a table like the following:

CREATE TABLE spreads (
 spread_id         serial NOT NULL,
 game_id           integer NOT NULL,
 sportsbook_id     integer NOT NULL,
 spread_type       integer NOT NULL,
 spread_duration   integer NOT NULL,
 home_line         double precision,
 home_odds         integer,
 away_line         double precision,
 away_odds         integer,
 update_time       timestamp without time zone NOT NULL,
 game_update_count integer NOT NULL
);

I'm trying to get the last row inserted (max game_update_count), for each group of (sportsbook_id, spread_type, spread_duration, game_id).

The following query gets me close, but I am not able to select the lines/odds without Postgres complaining.

SELECT 
     spreads.game_id, sportsbook_id, spread_type, spread_duration,
     MAX(game_update_count) AS game_update_count 
FROM spreads 
LEFT JOIN schedule ON  schedule.game_id = spreads.game_id 
WHERE date >= '2012-01-01' AND date <= '2012-01-02' 
GROUP BY 
     spreads.game_id, sportsbook_id, spread_type, spread_duration 
ORDER BY 
    spread_duration, spread_type, sportsbook_id, spreads.game_id,
    game_update_count DESC;

Anyone have any thoughts on a better approach?

Erwin Brandstetter · Accepted Answer · 2025-02-24 22:32:23Z

`DISTINCT ON`

To "get the last row" as requested in the title, use DISTINCT ON in Postgres:

SELECT DISTINCT ON (1, 2, 3, 4)
       sp.game_id, sp.sportsbook_id, sp.spread_type, sp.spread_duration  -- any other columns?
     , sp.game_update_count
FROM   schedule sch
JOIN   spreads  sp USING (game_id)  -- INNER JOIN!
WHERE  sch.date BETWEEN '2012-01-01' AND '2012-01-02'
ORDER  BY 4, 3, 2, 1, sch.game_update_count DESC;

See:

Select first row in each GROUP BY group?

The numbers are just syntax shorthand referring to the ordinal position of SELECT items.

If game_update_count can be NULL, you'll want game_update_count DESC NULLS LAST. See:

Sort by column ASC, but NULL values first?

I removed the misleading LEFT JOIN. WHERE references schedule.date, which forces an [INNER] JOIN anyway.

`GROUP BY`

While you only retrieve columns that are listed in GROUP BY, this works, too:

SELECT sp.game_id, sp.sportsbook_id, sp.spread_type, sp.spread_duration
     , MAX(sp.game_update_count) AS game_update_count 
FROM   schedule sch
JOIN   spreads sp USING (game_id)
WHERE  sch.date BETWEEN '2012-01-01' AND '2012-01-02' 
GROUP  BY sp.game_id, sp.sportsbook_id, sp.spread_type, sp.spread_duration
ORDER  BY sp.spread_duration, sp.spread_type, sp.sportsbook_id, sp.game_id;

You used game_update_count as unqualified (!) input column name and as output column name. This is begging for trouble. Some clauses default to input columns, others to output columns. Avoid this confusion. See:

Scope of a column alias in a SELECT with GROUP BY

I added table qualifications.

Plus, it makes no sense to add game_update_count to ORDER BY after you GROUP BY all other columns so that only a single value (the maximum) remains.

Upvoted based on code readability. Erwin, do you know if Microsoft offers DISTINCT ON? This is the first I've seen it and I'm having some trouble finding documentation on it. — Mark Iannucci
– Mark Iannucci, Commented Jan 21, 2015 at 19:13
@MarkIannucci: No, AFAIK, this is not available in SQL Server or MS Access. It's a Postgres extension to the standard SQL DISTINCT clause. — Erwin Brandstetter
– Erwin Brandstetter, Commented Jan 21, 2015 at 19:17
This example makes it look like the DISTINCT ON items need to be in the reverse order of the ORDER BY items, but that's not the case, is it? — Andy
– Andy, Commented Apr 3, 2020 at 19:26
@Andy: No, items don't have to be in reverse order. I just kept the sort order of the original query. Leading ORDER BY expressions must match the set of expressions in DISTINCT ON, but we are free to rearrange order within that set. I clarified the explanation in my linked answer as that wasn't clear before. — Erwin Brandstetter
– Erwin Brandstetter, Commented Apr 3, 2020 at 20:04

Mark Iannucci · Accepted Answer · 2015-01-21 08:23:12Z

Instead of using a group by, have you tried using a Window Function?

Try this:

SELECT * 
FROM (
    SELECT 
       spreads.game_id, sportsbook_id, spread_type, spread_duration,
       row_number() over (partition by spreads.game_id, sportsbook_id, spread_type, spread_duration order by update_time desc) priority
    FROM spreads 
    LEFT JOIN schedule ON 
        schedule.game_id = spreads.game_id 
    WHERE date >= '2012-01-01' AND date <= '2012-01-02'
     )
WHERE priority = 1

The row_number() which I've aliased as priority is the window function and what it is doing is conceptually similar to a GROUP BY clause. The difference is that it is just allowing you to still see the data with row-level fidelity through the window. In this case, you'll notice that the columns you grouped by have been used to partition the data. The outer SELECT statement eliminates the data that you don't want (all of the out of date sport-bet lines).

I wish you the best as you implement Window Functions (and with your sport betting)!

(word of warning... I typically work with SQL Server guy, so my code may not by syntactically correct, but it should get you off to a good start)

Thank you for this. I had looked into window functions but hit a wall. Your explanation makes sense, appreciate it! — Jeremy
– Jeremy, Commented Jan 22, 2015 at 0:13

Stack Exchange Network

Efficient query to get last row group by multiple columns

2 Answers 2

`DISTINCT ON`

`GROUP BY`

Your Answer

Hot Network Questions

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related