Grouped LIMIT in PostgreSQL: show the first N rows for each group?

Question

I need to take the first N rows for each group, ordered by custom column.

Given the following table:

db=# SELECT * FROM xxx;
 id | section_id | name
----+------------+------
  1 |          1 | A
  2 |          1 | B
  3 |          1 | C
  4 |          1 | D
  5 |          2 | E
  6 |          2 | F
  7 |          3 | G
  8 |          2 | H
(8 rows)

I need the first 2 rows (ordered by name) for each section_id, i.e. a result similar to:

 id | section_id | name
----+------------+------
  1 |          1 | A
  2 |          1 | B
  5 |          2 | E
  6 |          2 | F
  7 |          3 | G
(5 rows)

I am using PostgreSQL 8.3.5.

ngspkinga · Accepted Answer · 2016-08-11 09:08:48Z

401

New solution (PostgreSQL 8.4)

SELECT
  * 
FROM (
  SELECT
    ROW_NUMBER() OVER (PARTITION BY section_id ORDER BY name) AS r,
    t.*
  FROM
    xxx t) x
WHERE
  x.r <= 2;

edited Aug 11, 2016 at 9:08

ngspkinga

4115 silver badges16 bronze badges

answered May 19, 2011 at 19:55

Dave

4,0261 gold badge17 silver badges2 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Bruno Over a year ago

This works with PostgreSQL 8.4 too (window functions start with 8.4).

NurShomik Over a year ago

Awesome! It works flawlessly. I am curious though, is there a way to do this with group by?

Diligent Key Presser Over a year ago

For those who works with like millions rows and seeks for really performant way to do this - poshest's answer is the way to go. Just dont forget to spice ti up with proper indexing.

wistlo Over a year ago

This works in mySQL 8.0.24, fast condensation of 6M rows of associated email addresses to five in each category (no sorting or criteria, just needed five email addresses. A small number of categories (companies) had several thousand addresses.)

poshest · Accepted Answer · 2016-06-16 14:25:29Z

99

Since v9.3 you can do a lateral join

select distinct t_outer.section_id, t_top.id, t_top.name from t t_outer
join lateral (
    select * from t t_inner
    where t_inner.section_id = t_outer.section_id
    order by t_inner.name
    limit 2
) t_top on true
order by t_outer.section_id;

It might be faster but, of course, you should test performance specifically on your data and use case.

answered Jun 16, 2016 at 14:25

poshest

4,2472 gold badges30 silver badges39 bronze badges

9 Comments

villasv Over a year ago

Very cryptic solution IMO, specially with those names, but a good one.

Artur Rashitov Over a year ago

This solution with LATERAL JOIN might be significantly faster than above one with windowed function (in some cases) if you have index by t_inner.name column

gillesB Over a year ago

The query is easier to understand if it does not contain the self-join. In that case distinct is not needed. An example is shown in the link poshest posted.

Diligent Key Presser Over a year ago

Dude, this is mindlowing. 120ms instead of 9sec yielded with "ROW_NUMBER" solution. Thank you!

Max Rosett Over a year ago

Running a variant of this lateral join on a table with 2 million rows put my DB's CPU at 100% for 12 hours. YMMV.

|

David Skinner · Accepted Answer · 2022-03-24 11:36:19Z

22

A lateral join is the way to go, but you should do a nested query first to improve performance on large tables.

SELECT t_limited.*
FROM (
        SELECT DISTINCT section_id
        FROM t
    ) t_groups
    JOIN LATERAL (
        SELECT *
        FROM t t_all
        WHERE t_all.section_id = t_groups.section_id
        ORDER BY t_all.name
        LIMIT 2
    ) t_limited ON true

Without the nested select distinct, the join lateral runs for every line in the table, even though the section_id is often duplicated. With the nested select distinct, the join lateral runs once and only once for each distinct section_id.

answered Mar 24, 2022 at 11:36

David Skinner

2212 silver badges3 bronze badges

2 Comments

marcopeg Over a year ago

I tried this out, and it is much faster than other solutions. I'm discarding my previous solution based on Window Functions in favour to this one

JanKanis Dec 2, 2024 at 14:49

Note that Postgres is very bad at optimizing SELECT DISTINCT, that will usually result in a full table scan. So only do that if the table is small, else try to get the list of section_ids in another way.

Kouber Saparev · Accepted Answer · 2016-02-15 11:26:01Z

12

Here's another solution (PostgreSQL <= 8.3).

SELECT
  *
FROM
  xxx a
WHERE (
  SELECT
    COUNT(*)
  FROM
    xxx
  WHERE
    section_id = a.section_id
  AND
    name <= a.name
) <= 2

edited Feb 15, 2016 at 11:26

answered Jul 17, 2009 at 14:41

Kouber Saparev

8,2552 gold badges32 silver badges28 bronze badges

Comments

Quassnoi · Accepted Answer · 2009-07-14 15:35:15Z

2

SELECT  x.*
FROM    (
        SELECT  section_id,
                COALESCE
                (
                (
                SELECT  xi
                FROM    xxx xi
                WHERE   xi.section_id = xo.section_id
                ORDER BY
                        name, id
                OFFSET 1 LIMIT 1
                ),
                (
                SELECT  xi
                FROM    xxx xi
                WHERE   xi.section_id = xo.section_id
                ORDER BY 
                        name DESC, id DESC
                LIMIT 1
                )
                ) AS mlast
        FROM    (
                SELECT  DISTINCT section_id
                FROM    xxx
                ) xo
        ) xoo
JOIN    xxx x
ON      x.section_id = xoo.section_id
        AND (x.name, x.id) <= ((mlast).name, (mlast).id)

edited Jul 14, 2009 at 15:35

answered Jul 14, 2009 at 10:48

Quassnoi

427k94 gold badges628 silver badges623 bronze badges

4 Comments

Kouber Saparev Over a year ago

The query is very close to the one I need, except that it is not showing sections with less than 2 rows, i.e. the row with ID=7 isn't returned. Otherwise I like your approach.

Kouber Saparev Over a year ago

Thank you, I just came to the same solution with COALESCE, but you were faster. :-)

Kouber Saparev Over a year ago

Actually the last JOIN sub-clause could be simplified to: ... AND x.id <= (mlast).id as the ID have already been chosen according to the name field, no?

Quassnoi Over a year ago

@Kouber: in your example the name's and id's are sorted in same order, so you won't see it. Make the names in reverse order and you will see that these queries yield different results.

wildplasser · Accepted Answer · 2012-12-07 20:53:01Z

2

        -- ranking without WINDOW functions
-- EXPLAIN ANALYZE
WITH rnk AS (
        SELECT x1.id
        , COUNT(x2.id) AS rnk
        FROM xxx x1
        LEFT JOIN xxx x2 ON x1.section_id = x2.section_id AND x2.name <= x1.name
        GROUP BY x1.id
        )
SELECT this.*
FROM xxx this
JOIN rnk ON rnk.id = this.id
WHERE rnk.rnk <=2
ORDER BY this.section_id, rnk.rnk
        ;

        -- The same without using a CTE
-- EXPLAIN ANALYZE
SELECT this.*
FROM xxx this
JOIN ( SELECT x1.id
        , COUNT(x2.id) AS rnk
        FROM xxx x1
        LEFT JOIN xxx x2 ON x1.section_id = x2.section_id AND x2.name <= x1.name
        GROUP BY x1.id
        ) rnk
ON rnk.id = this.id
WHERE rnk.rnk <=2
ORDER BY this.section_id, rnk.rnk
        ;

answered Dec 7, 2012 at 20:53

wildplasser

44.5k9 gold badges72 silver badges116 bronze badges

6 Comments

user330315 Over a year ago

CTEs and Window functions were introduced with the same version, so I don't see the benefit of the first solution.

wildplasser Over a year ago

The post is three years old. Besides, there may still be implementations that lack them (nudge nudge say no more). It could also be considered an exercise in old-fashoned querybuilding. (though CTEs are not very old-fashoned)

user330315 Over a year ago

The post is tagged "postgresql" and the PostgreSQL version that introduced CTEs also introduced windowing functions. Hence my comment (I did see it's that old - and PG 8.3 did have neither)

wildplasser Over a year ago

The post mentions 8.3.5, and I believe they were introduced in 8.4. Besides: it is also good to know about alternative scenarios, IMHO.

user330315 Over a year ago

That's exactly what I mean: 8.3 neither had CTEs nor windowing functions. So the first solution won't work on 8.3

|

Collectives™ on Stack Overflow

Grouped LIMIT in PostgreSQL: show the first N rows for each group?

6 Answers 6

4 Comments

9 Comments

2 Comments

Comments

4 Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

4 Comments

9 Comments

2 Comments

Comments

4 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related