0

We have a table that may contain 1M-5M or more rows. It has an assortment of string, numeric and date columns.

The UI presents a table view, with "infinite scroll" paging, and various filters and sort options on many of the columns.

In order to support consistent result ordering for the paging functionality, the UI always adds a "unique ID" column as the last sort (or the default sort if the user doesn't select a sort).

We are struggling to define the indexes so that the various filtering and sorting combinations perform well in UI terms. Sometimes queries take 90s or more. We need sub-second or worst case 2-3s queries.

An example query might look like:

select _columns_
where numeric_col >= 80
order by other_col desc, unique_id_col desc 
limit 100 offset 0

This will perform poorly, with EXPLAIN saying something like

   ->  Index Scan Backward using idx_on_other_col on my_table  (cost=0.43..676590.40 rows=575273 width=33)
         Filter: (numeric_col >= 80)

If, however, the sorting is the same as the filtering, the query behaves much better, with the EXPLAIN showing Index Cond in place of the Filter. I am not sure if that is what explains [sic] the difference in performance.

I also don't know how much the "range" (>=) filter affects performance vs. equality filters.

But if we need to support filters (range and equality) for maybe 10-15 columns (one or more filters), and a single sort on either just the unique ID (for results order consistency) or on 1 of 6 or so fields plus the unique ID, what can we do in terms of indexes to make these queries faster? We can also consider adding more terms to the range indexes (even though naturally they usually have only a single bound, it might not be hard to add the other side of the range in some cases, if this would help).

EDIT based on comments

Here is full EXPLAIN output:

clustering.a1ab960a-7596-4519-9a56-52723ac35500=> explain (analyze, buffers) select * from "appsessions" where whole_risk > 80 order by app_session_id desc limit 100;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..192.46 rows=100 width=1557) (actual time=21376.632..21376.633 rows=0 loops=1)
   Buffers: shared hit=1324199 read=422192
   I/O Timings: read=18567.241
   ->  Index Scan Backward using pk_appsessions on appsessions  (cost=0.43..1104707.17 rows=575273 width=1557) (actual time=21376.630..21376.631 rows=0 loops=1)
         Filter: (whole_risk > 80)
         Rows Removed by Filter: 1725820
         Buffers: shared hit=1324199 read=422192
         I/O Timings: read=18567.241
 Planning Time: 0.197 ms
 Execution Time: 21376.668 ms
(10 rows)

clustering.a1ab960a-7596-4519-9a56-52723ac35500=> explain (analyze, buffers) select * from "appsessions" where whole_risk > 80 order by app_session_id desc limit 100;
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..192.46 rows=100 width=1557) (actual time=2238.998..2238.999 rows=0 loops=1)
   Buffers: shared hit=1745176 read=1215
   I/O Timings: read=5.251
   ->  Index Scan Backward using pk_appsessions on appsessions  (cost=0.43..1104707.17 rows=575273 width=1557) (actual time=2238.997..2238.997 rows=0 loops=1)
         Filter: (whole_risk > 80)
         Rows Removed by Filter: 1725820
         Buffers: shared hit=1745176 read=1215
         I/O Timings: read=5.251
 Planning Time: 0.195 ms
 Execution Time: 2239.039 ms
(10 rows)

2 seconds, while not great, is doable, but the 21s can be 50s or more. The problem is that the table is not going to be in the buffers most of the time - we need the performance to be acceptable also in the first case.

EDIT again (Nov 30)

Based on helpful follow-ups from @jjanes in the comments, I looked into ANALYZE. It appeared to me that pg_stat_user_tables.last_autoanalyze showed that the table was analyzed after it was last updated. However I ran ANALYZE myself, and now the EXPLAIN output is quite different:

clustering.a1ab960a-7596-4519-9a56-52723ac35500=> explain (analyze, buffers) select * from "appsessions" where whole_risk > 80 order by app_session_id desc limit 100;
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=61.25..61.32 rows=29 width=1629) (actual time=0.022..0.023 rows=0 loops=1)
   Buffers: shared hit=3
   ->  Sort  (cost=61.25..61.32 rows=29 width=1629) (actual time=0.022..0.022 rows=0 loops=1)
         Sort Key: app_session_id DESC
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=3
         ->  Index Scan using sessions_by_whole_risk on appsessions  (cost=0.43..60.55 rows=29 width=1629) (actual time=0.016..0.016 rows=0 loops=1)
               Index Cond: (whole_risk > 80)
               Buffers: shared hit=3
 Planning Time: 0.181 ms
 Execution Time: 0.053 ms
(11 rows)

So, is the autoanalyze not doing its job? Do we need to run ANALYZE ourselves here?

3
  • The misestimate of 575273 expected rows vs 0 actual rows is surely not helping. Estimates are rarely spot on, but it shouldn't be hard to do better than that. Has the table been ANALYZEd lately? Commented Nov 27, 2023 at 14:03
  • @jjanes we don't ANALYZE the table - isn't there some auto vacuum or analyze job that takes care of this? This is RDS. It looked to me like the last_autoanalyze was after the table was last updated. However, running it myself does seem to have changed things - I'm updating my question Commented Nov 30, 2023 at 7:25
  • There is an autoanalyze, but it isn't always aggressive enough. For example if most of the updates fall on rows with whole_risk > 80, then that subpopulation of rows might have changed a lot even though the table as a whole hasn't changed enough to trigger autoanalyze. You might need to change some of the table-specific analyze settings. It is hard to know when the last update to a table was (it isn't automatically recorded anywhere unless you created a system to do that), so maybe you were just mistaken about that. Commented Nov 30, 2023 at 16:33

2 Answers 2

0

5M rows is not very many. With adequate hardware you should get sub-second performance even with no indexes, just seq-scanning the table and sorting the rows. At least I see that, when all the table data can stay in memory. This if fortunate, as the flexibility you describe would need an impractical number of indexes to cover all your bases (even if were possible, which it probably isn't due to the combination of range and ORDER BY)

I also don't know how much the "range" (>=) filter affects performance vs. equality filters.

Well obviously the >= is less selective than the = on the same test value (except in the special case when the test value is the greatest value present in the table (or greater)) and dealing with more rows will take more time. Perhaps less obviously, for cola=5 order by colb limit 100 a btree index on (cola,colb) can deliver the selected rows for cola already in order by colb and then stop early when the LIMIT is met, while for cola>=5 order by colb limit 100 it cannot because the possibility that there is more than one qualifying value for cola breaks up the ordering of the colb. So a Sort node needs to be injected into the plan, and the scan doesn't get to stop early. Even if 5 happens to be the maximum for cola, PostgreSQL has no way of knowing that for certain until it runs the query, so the planner still injects the Sort node.

1
  • The table data cannot stay in memory. Indeed, with our problematic queries, we see that the first page load can be 90s or more, then subsequent page loads are quite fast. But the nature of our database usage is such that we cannot expect any given table to stay in memory - there are too many other tables that are frequently used, so the cache (buffers) cannot be relied on here. Commented Nov 27, 2023 at 8:27
0
CREATE UNLOGGED TABLE foo (a NUMERIC(9,2), b NUMERIC(8,2), c NUMERIC(8,2), d NUMERIC(8,2), e NUMERIC(8,2), f NUMERIC(8,2) );
INSERT INTO foo SELECT a,random()*99999,random()*99999,random()*99999,random()*99999,random()*99999 from generate_series(1,5000000) a;
VACUUM ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE b<=90 ORDER BY c LIMIT 100;
--> parallel seq scan, filter, heapsort: 250ms

EXPLAIN ANALYZE SELECT * FROM foo ORDER BY c LIMIT 100;
--> parallel seq scan, filter, heapsort: 300ms

EXPLAIN ANALYZE SELECT * FROM foo ORDER BY c LIMIT 10000;
--> parallel seq scan, filter, heapsort: 533ms

EXPLAIN ANALYZE SELECT * FROM foo ORDER BY c;
--> seq scan, disk merge sort: 4300ms

EXPLAIN ANALYZE SELECT * FROM foo WHERE b<=10000 AND c<=10000 ORDER BY d LIMIT 100;
--> parallel seq scan, filter, heapsort: 200ms

CREATE INDEX foo_b ON foo(b);
CREATE INDEX foo_c ON foo(c);

EXPLAIN ANALYZE SELECT * FROM foo WHERE b<=10000 AND c<=10000 ORDER BY d LIMIT 100;    

--> parallel bitmap index scan, filter, heapsort: 140ms

EXPLAIN ANALYZE SELECT * FROM foo WHERE b<=1000 AND c<=1000 ORDER BY d LIMIT 100;
--> parallel bitmap index scan, filter, heapsort: 14ms

If your search UI allows filtering on any column, then indices will only help if search criteria are selective enough. As shown in the example above, if the bitmap index scan finds 10% matching rows and they're randomly distributed in the table, it's going to have to read most of it anyway. However when the search criteria only matches 1% of the rows then the index really speeds things up.

First problem: as shown in the example above, a table with 5M rows shouldn't take more than a second to scan, so if it takes more than a minute, something's wrong and you should investigate this. Maybe the server is swapping, or misconfigured, or someone is mining bitcoin with it, who knows. Even with spinning disks it should be fast, because the table should fit in cache.

Second problem: The UI does pagination like it's 1999.

Quite often if a query involves sorting, it's going to be the slowest step, and the database has to scan and filter the whole result set, sort it, then throw away most of it to return the 100 rows before the LIMIT. Postgres' top-N heapsort mitigates this by making it much faster to throw away results that would be cut off by LIMIT, and that's very nice, but the rows are still scanned and thrown away. Then it has to do it all over again for the next page.

The solution is to fetch more results, increase the LIMIT, then either display the results or put them in cache, whatever, but don't do another query. You only do one query per page if the sorting can be skipped because there's an index, typically a forum where stuff is always in the same order, so a "post index in topic" column can be added to the table to find the posts directly.

This also solves your "sort by unique id" problem, because if you set the limit high enough, no-one's ever going to scroll that far, so there is no second page, and no consistent order to maintain.

EXPLAIN ANALYZE SELECT * FROM foo WHERE b<=1000 AND c<=1000 ORDER BY a LIMIT 100;
4
  • As I mentioned in the comment on @jjanes answer, the table will not fit (or at least it will not remain) in cache, since many other similar tables are used as will fill the cache themselves. Commented Nov 27, 2023 at 8:28
  • Regarding paging - I don't understand. I need the sorting to be dynamic - not always in the same order. I don't understand how this helps me. Commented Nov 27, 2023 at 8:29
  • Regarding paging: instead of fetching one page, increase the LIMIT to fetch several, preferably a lot, then cache the result. So you need to do the query less often. Also since LIMIT will be much higher, postgres will be less likely to use the index scan backwards that gives so much trouble with random IO, and it will favor seqscan. Is the table stored on a SSD? Commented Nov 27, 2023 at 10:31
  • If the table is small you can also put it in a separate tablespace and tell linux to keep this tablecache in RAM disk cache with a tool like vmprobe or vmtouch serverfault.com/questions/43383/… Commented Nov 27, 2023 at 10:41

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.