Is there any harm to having a duplicate index in Postgresql?

Question

I have the following structure.

CREATE TABLE join_table (
  id integer NOT NULL,
  col_a integer NOT NULL,
  col_b integer NOT NULL
)

CREATE INDEX index_on_col_a ON join_table USING btree (col_a);
CREATE INDEX index_on_col_b ON join_table USING btree (col_b);
CREATE UNIQUE INDEX index_on_col_a_and_col_b ON join_table USING btree (col_a, col_b);

There are also foreign keys on col_a and col_b.

Clearly index_on_col_a is no longer needed, but is there a cost or benefit to keeping or deleting it?

My guess is;

keeping it will slow down inserts
selects using just col_a may be faster if I keep it

hmm... should I avoid guessing in questions? maybe someone has something more firm than a guess. — Matthew Rudy
– Matthew Rudy, Commented Mar 21, 2012 at 9:29
It depends on the case, Better write performance or query perfor But from my personal opinions, we need drop index index_on_col_a — francs
– francs, Commented Mar 21, 2012 at 9:38
thanks @francs. I usually would. I just wanted to get some verification that I'm right. I guess I'll just remove it. — Matthew Rudy
– Matthew Rudy, Commented Mar 21, 2012 at 10:05
We have discussed this case in great detail at dba.SE recently. — Erwin Brandstetter
– Erwin Brandstetter, Commented Mar 21, 2012 at 11:16

Andriy M · Accepted Answer · 2012-03-21 12:59:37Z

11

You can drop the index on col_a. PostgreSQL is able to use the combined index if you query on col_a and is also able to use the index if you query on col_a and col_b. These query types can use the combined index:

WHERE col_a = 'val'
WHERE col_a = 'val' AND col_b = 'val'

The combined index cannot be used to query only col_b or an OR junction of col_a and col_b. So the additional index over col_b can make sense if you frequently have queries querying only col_b.

Edit: So: you don't have an advantage creating index_on_col_a, but you have a slower write speed. Drop it.

edited Mar 21, 2012 at 12:59

Andriy M

78k18 gold badges100 silver badges157 bronze badges

answered Mar 21, 2012 at 9:59

ckruse

9,7701 gold badge27 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

yugandhar · Accepted Answer · 2023-01-20 11:33:47Z

0

Even though I agree with the other answer on dropping the index on col_a, sometimes index combinations could be so large that, the index on (col_a, col_b) takes more disk pages, compared to col_a index, which could lead to more I/O on disk. Please use EXPLAIN ANALYZE and EXPLAIN FORMAT=JSON to find the actual rows read, and total cost (represented with equivalent of I/O ops).

If there are more col_b per col_a (per 1 col_a, there are >100 col_b or so), then having the col_a will be helpful. if you are doing range queries, this will be more useful in that case. All these make sense if you really care about very low latency during reads.

answered Jan 20, 2023 at 11:33

yugandhar

7109 silver badges17 bronze badges

Collectives™ on Stack Overflow

Is there any harm to having a duplicate index in Postgresql?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related