1

I've got a table with a bunch of statistics from counties in the US.

Because it's so large, I want to index it with a comprehensive set of partial indexes.

CREATE INDEX county_stats_34_idx on stats_county (stateid, countyid, site, yeargroup, foo, bar)
WHERE stateid = 1;
CREATE INDEX county_stats_25_idx on stats_county (stateid, countyid, site, yeargroup, foo, bar)
WHERE stateid = 2;
...
CREATE INDEX county_stats_32_idx on stats_county (stateid, countyid, site, yeargroup, foo, bar)
WHERE stateid = 53;

This is going to scan each row of the table 53 times, checking stateid and adding to the index where appropriate. I wonder--is there a more efficient way to create these indices? Logically, it only needs to scan the table once, with a 53-item switch...

Just curious, since it seems like I'll be needing to do this sort of thing with some frequency...

Thanks!

4
  • What's wrong with a single index? Commented Jun 7, 2011 at 15:01
  • 1
    in this case it would be several gigabytes, too large to load into memory on this machine, and so query performance will be poor. Commented Jun 8, 2011 at 14:20
  • In so far as I'm aware, multiple partial indexes won't make much of a difference. The entire index needs not be in memory to be used; only parts of it. Partial or not, the parts that are frequently used (if any) will stay in memory. Commented Jun 8, 2011 at 14:22
  • I'm quite new to this, but when my indexes are too large, the planner won't use them. When I make them small enough, it usually does. I'm not sure why this is, but I was under the impression that the whole index needed to be loaded into memory in order to be used. I'd love to know whether this is true or not--and where I can look to find out? Still working my way through the doc... Commented Jul 7, 2011 at 14:17

1 Answer 1

1

If you add an index on stateid, PG will not have to scan the entire table. Of course, building that one will have to scan the entire table, and the creation of your actual indices will need to scan that index.

Also, word on the street is that you could just start them concurrently, from within different sessions. It makes sense because optimally you'd just get one disk hit per row, and cache hits from then on. Though in your case no two indices created actually need to read the same row - they each cover a non-intersecting subset.

I think you should try creating a simple index on stateid.

Sign up to request clarification or add additional context in comments.

2 Comments

ah, that makes sense--index on stateid and then start each index build from within its own session. I'll give that a go. thanks!
the statement "the creation of your actual indices will need to scan that index" might be incorrect... see stackoverflow.com/questions/53531513/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.