Most efficient way to index an array column to allow for selecting rows containing a value

Question

Say I have a table like this:

create table mytable (
   mycol text[]
)

And I want to select all rows where mycol contains "hello". I can think of two ways to do this:

SELECT * FROM mytable WHERE 'hello'=any(mycol);
--or
SELECT * FROM mytable WHERE mycol && '{hello}';

I know for the second option I can use GIN indexes (which allow for array options), and I'm pretty sure for the first I would use a BTREE (or maybe a HASH?).

So my question is this: If I only need to check the membership of a single item, which method with what index is most efficient for a table with millions of rows?

Run the queries with explain and it will tell you.

Schwern
– Schwern

2022-03-22 18:01:35 +00:00
Commented Mar 22, 2022 at 18:01 — Schwern
– Schwern, Commented Mar 22, 2022 at 18:01

jjanes · Accepted Answer · 2022-03-22 20:34:51Z

1

The 2nd, with a GIN index.

The first one can't use either a btree or a hash index in an efficient way. It can use a btree index, but only as a skinny copy of the table.

You might be confusing this with the reverse situation, where the column is before the ANY (and is a scalar) and the literal is inside the ANY. This one can use the btree.

SELECT * FROM mytable_scalar WHERE mycol =any('{hello,goodbye}');

A good way to see how efficient something will be is often just to try it with fake data but of a vaguely realistic size:

insert into mytable select ARRAY[md5(random()::text),md5(random()::text)] from generate_series(1,1500000);

answered Mar 22, 2022 at 20:34

jjanes

44.9k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Most efficient way to index an array column to allow for selecting rows containing a value

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related