Query every n rows in POSTGRESQL

Question

I have a simple table in postgresql, say

id	fname
abc	bert
def	jaap
ghi	kees
jkl	jan
etc	piet

...etc...

With a string primary key id.

My table has millions of rows.

I want to get a list of every 10_000th (give or take) row.

Basically:

SELECT id 
FROM (
  SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS rownum 
  FROM mytable
) as t 
WHERE ((t.rownum - 1) % 10000) = 0;

But that seems to be very slow. Is there an efficient alternative?

TABLESAMPLE comes into mind. But if that's useful to you depends on how accurate the every 10k have to be. — sticky bit
– sticky bit, Commented Feb 7, 2021 at 18:59
Please use numbers. How many millions? How slow is it? How fast do you need it to be? What is the output of EXPLAIN (ANALYZE, BUFFERS) <query>, preferably after turning track_io_timing on? — jjanes
– jjanes, Commented Feb 7, 2021 at 19:48
@Maarten: It needs you to know how many rows or what percentage of all rows are equivalent to roughly pick every 10k rows, yes. And of course the gaps can vary greatly. In an extreme case it's also possible to pick up two consecutive rows. — sticky bit
– sticky bit, Commented Feb 7, 2021 at 20:14
@Maarten: I'm sorry but I don't know, if it's gonna be faster. I think there's a good chance it will, but you have to test that for yourself to be sure. — sticky bit
– sticky bit, Commented Feb 7, 2021 at 20:34

Kazi Mohammad Ali Nur Romel · Accepted Answer · 2021-02-07 20:27:39Z

3

I am afraid that it might be the best possible solution. I have executed your below query in sql server on a table having almost 65 million rows and getting result with 18 seconds. I think it might be the best possible solution. Since it's primary key column a cluster is already there to speed up the process. If you regularly do the maintenance job it might be the best you can ask for.

SELECT id 
FROM (
  SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS rownum 
  FROM mytable
) as t 
WHERE ((t.rownum - 1) % 10000) = 0;

Please let me know the exact row numbers and your execution time. And run it after reindexing .

answered Feb 7, 2021 at 20:27

Kazi Mohammad Ali Nur Romel

16.2k2 gold badges17 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sergey · Accepted Answer · 2021-02-07 18:59:50Z

2

You could try NTILE()-function

WITH CTE(ID,FNAME)AS
 (
    SELECT 'ABC','BERT'
        UNION ALL 
    SELECT 'DEF','JAAP'
        UNION  ALL
    SELECT 'GHI','KEES'
        UNION ALL 
    SELECT 'JKL','JAN'
        UNION ALL
    SELECT 'ETC','PIET'
 )
 SELECT C.ID,C.FNAME,
     NTILE(3)OVER(ORDER BY C.ID ASC)XCOL 
      FROM CTE AS C;

answered Feb 7, 2021 at 18:59

Sergey

5,2801 gold badge9 silver badges12 bronze badges

Collectives™ on Stack Overflow

Query every n rows in POSTGRESQL

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related