12

I have the following structure.

CREATE TABLE join_table (
  id integer NOT NULL,
  col_a integer NOT NULL,
  col_b integer NOT NULL
)

CREATE INDEX index_on_col_a ON join_table USING btree (col_a);
CREATE INDEX index_on_col_b ON join_table USING btree (col_b);
CREATE UNIQUE INDEX index_on_col_a_and_col_b ON join_table USING btree (col_a, col_b);

There are also foreign keys on col_a and col_b.

Clearly index_on_col_a is no longer needed, but is there a cost or benefit to keeping or deleting it?

My guess is;

  • keeping it will slow down inserts
  • selects using just col_a may be faster if I keep it
5
  • Seems like you already know the answer? Commented Mar 21, 2012 at 9:26
  • 1
    hmm... should I avoid guessing in questions? maybe someone has something more firm than a guess. Commented Mar 21, 2012 at 9:29
  • 1
    It depends on the case, Better write performance or query perfor But from my personal opinions, we need drop index index_on_col_a Commented Mar 21, 2012 at 9:38
  • thanks @francs. I usually would. I just wanted to get some verification that I'm right. I guess I'll just remove it. Commented Mar 21, 2012 at 10:05
  • We have discussed this case in great detail at dba.SE recently. Commented Mar 21, 2012 at 11:16

2 Answers 2

11

You can drop the index on col_a. PostgreSQL is able to use the combined index if you query on col_a and is also able to use the index if you query on col_a and col_b. These query types can use the combined index:

WHERE col_a = 'val'
WHERE col_a = 'val' AND col_b = 'val'

The combined index cannot be used to query only col_b or an OR junction of col_a and col_b. So the additional index over col_b can make sense if you frequently have queries querying only col_b.

Edit: So: you don't have an advantage creating index_on_col_a, but you have a slower write speed. Drop it.

Sign up to request clarification or add additional context in comments.

Comments

0

Even though I agree with the other answer on dropping the index on col_a, sometimes index combinations could be so large that, the index on (col_a, col_b) takes more disk pages, compared to col_a index, which could lead to more I/O on disk. Please use EXPLAIN ANALYZE and EXPLAIN FORMAT=JSON to find the actual rows read, and total cost (represented with equivalent of I/O ops).

If there are more col_b per col_a (per 1 col_a, there are >100 col_b or so), then having the col_a will be helpful. if you are doing range queries, this will be more useful in that case. All these make sense if you really care about very low latency during reads.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.