0

Say I have a table like this:

create table mytable (
   mycol text[]
)

And I want to select all rows where mycol contains "hello". I can think of two ways to do this:

SELECT * FROM mytable WHERE 'hello'=any(mycol);
--or
SELECT * FROM mytable WHERE mycol && '{hello}';

I know for the second option I can use GIN indexes (which allow for array options), and I'm pretty sure for the first I would use a BTREE (or maybe a HASH?).

So my question is this: If I only need to check the membership of a single item, which method with what index is most efficient for a table with millions of rows?

1
  • Run the queries with explain and it will tell you. Commented Mar 22, 2022 at 18:01

1 Answer 1

1

The 2nd, with a GIN index.

The first one can't use either a btree or a hash index in an efficient way. It can use a btree index, but only as a skinny copy of the table.

You might be confusing this with the reverse situation, where the column is before the ANY (and is a scalar) and the literal is inside the ANY. This one can use the btree.

SELECT * FROM mytable_scalar WHERE mycol =any('{hello,goodbye}');

A good way to see how efficient something will be is often just to try it with fake data but of a vaguely realistic size:

insert into mytable select ARRAY[md5(random()::text),md5(random()::text)] from generate_series(1,1500000);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.