1

I have a very large table with 100M+ rows. I am trying to find if there is a faster way to execute the following.

Query:

SELECT *
FROM "public".example
WHERE a = "foo" and b = "bar"
order by c /* could be any of fields c to z */
limit 100;

Here is the table and indexes I have setup now.

Table:

  • id
  • a (string)
  • b (string)
  • c ... z (all integers)

Indexes:

"example_multi_idx" btree (a, b)
"c_idx" btree (c)

Thoughts:

  • If I was only sorting by c then an index of "example_multi_idx_with_c" btree (a, b, c) performs wonderfully. However, if I throw in a variety of sort by's, then I would need to create n number of multi-key indexes, which seems wasteful.

3 Answers 3

2

For this query:

SELECT *
FROM "public".example
WHERE a = "foo" and b = "bar"
order by c /* could be any of fields c to z */
limit 100;

The optimal index is example(a, b, c). Postgres should be able to use the index for sorting.

If you want to have multiple possible columns for the order by, you need a separate index for each one.

Sign up to request clarification or add additional context in comments.

2 Comments

I agree, however, how about if there are N number of order by options like I noted in my last bullet point? Seems wasteful to create an multi-key index for every possibility. But, that may just be the way of it unless there is a sub query or other fancy footwork that can be done?
@nakkor . . . If you want the index to be used for each condition, that is about the only choice you have. Sorting a relatively small amount of data should not take too long. So if the other conditions are highly selective, you can let Postgres do the sorting.
1

How large are the groups once you've filtered by a and b? While including c in the index will certainly help improve performance, if your groups are not particularly large then the sorting at the end of the operation shouldn't have a big cost.

Are you having performance issues with your current indexing?

1 Comment

The resulting sets are less than 10k. The sort cost from an analyze seems to be around 200ms, while with a full index is 1ms. I am hoping to find a good solution in between that doesn't cause a huge bloat of index sizes because of each multi index repeating a and b over and over.
0

Having an index directly on the order by column will work in most cases. Because Postgres can then walk on the order by column index and match each row with the filters you provide and pick the first 100.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.