create several partial indexes, scanning only once (Postgresql)?

Question

I've got a table with a bunch of statistics from counties in the US.

Because it's so large, I want to index it with a comprehensive set of partial indexes.

CREATE INDEX county_stats_34_idx on stats_county (stateid, countyid, site, yeargroup, foo, bar)
WHERE stateid = 1;
CREATE INDEX county_stats_25_idx on stats_county (stateid, countyid, site, yeargroup, foo, bar)
WHERE stateid = 2;
...
CREATE INDEX county_stats_32_idx on stats_county (stateid, countyid, site, yeargroup, foo, bar)
WHERE stateid = 53;

This is going to scan each row of the table 53 times, checking stateid and adding to the index where appropriate. I wonder--is there a more efficient way to create these indices? Logically, it only needs to scan the table once, with a 53-item switch...

Just curious, since it seems like I'll be needing to do this sort of thing with some frequency...

Thanks!

in this case it would be several gigabytes, too large to load into memory on this machine, and so query performance will be poor. — Stew
– Stew, Commented Jun 8, 2011 at 14:20
In so far as I'm aware, multiple partial indexes won't make much of a difference. The entire index needs not be in memory to be used; only parts of it. Partial or not, the parts that are frequently used (if any) will stay in memory. — Denis de Bernardy
– Denis de Bernardy, Commented Jun 8, 2011 at 14:22
I'm quite new to this, but when my indexes are too large, the planner won't use them. When I make them small enough, it usually does. I'm not sure why this is, but I was under the impression that the whole index needed to be loaded into memory in order to be used. I'd love to know whether this is true or not--and where I can look to find out? Still working my way through the doc... — Stew
– Stew, Commented Jul 7, 2011 at 14:17

aib · Accepted Answer · 2011-06-07 15:22:54Z

1

If you add an index on stateid, PG will not have to scan the entire table. Of course, building that one will have to scan the entire table, and the creation of your actual indices will need to scan that index.

Also, word on the street is that you could just start them concurrently, from within different sessions. It makes sense because optimally you'd just get one disk hit per row, and cache hits from then on. Though in your case no two indices created actually need to read the same row - they each cover a non-intersecting subset.

I think you should try creating a simple index on stateid.

answered Jun 7, 2011 at 15:22

aib

47.4k10 gold badges75 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Stew Over a year ago

ah, that makes sense--index on stateid and then start each index build from within its own session. I'll give that a go. thanks!

Alex R Over a year ago

the statement "the creation of your actual indices will need to scan that index" might be incorrect... see stackoverflow.com/questions/53531513/…

Collectives™ on Stack Overflow

create several partial indexes, scanning only once (Postgresql)?

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related