2

Given a string column with a value similar to /123/12/34/56/5/, what is the optimal way of querying for all the records that include the given number (12 for example)?

The solution from top of my head is:

SELECT id FROM things WHERE things.path LIKE '%/12/%'

But AFAIK this query can't use indexes on the column due to the leading %.

There must be something better. What is it?

Using PostgreSQL, but would prefer the solution that would work across other DBs too.

2
  • 2
    It would be better not to have a multivalued field :) Commented Apr 16, 2012 at 3:49
  • I can go with additional table 'thing_paths' with path in it. Then join it and query it ~ select DISTINCT things.id from things inner join thing_paths on thing_path.thing_id = things.id WHERE thing.path LIKE '/12/%'. But at this stage it is more than I ideally want to do. Commented Apr 16, 2012 at 3:56

2 Answers 2

4

If you're happy turning that column into an array of integers, like:

'/123/12/34/56/5/' becomes ARRAY[123,12,34,56,5]

So that path_arr is a column of type INTEGER[], then you can create a GIN index on that column:

CREATE INDEX ON things USING gin(path_arr);

A query for all items containing 12 then becomes:

SELECT * FROM things WHERE ARRAY[12] <@ path_arr;

Which will use the index. In my test (with a million rows), I get plans like:

EXPLAIN SELECT * FROM things WHERE ARRAY[12]  <@ path_arr;
                                      QUERY PLAN
----------------------------------------------------------------------------------------
 Bitmap Heap Scan on things  (cost=5915.75..9216.99 rows=1000 width=92)
   Recheck Cond: (path_arr <@ '{12}'::integer[])
   ->  Bitmap Index Scan on things_path_arr_idx  (cost=0.00..5915.50 rows=1000 width=0)
         Index Cond: ('{12}'::integer[] <@ path_arr)
(4 rows)
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. That's another option. But AFAIK it is technically the same as LIKE + GIN index as Erwin suggested, just expressed differently.
It is another GIN index. But it's one that more closely corresponds to the actual data. In principle the index will be more efficient because it won't be recording items containing "27/" and "1/2", etc.
Edmund, I accepted Erwin's answer simply because it was simpler (didn't have to change any code yet). But I wish I could accept 2 answers since your solution is more elegant.
There is often a tradeoff between pragmatism and elegance. And the syntax for indexes over lists doesn't help, so I might have picked Erwin's answer too (tbh). Thanks for your comment.
3

In PostgreSQL 9.1 you could utilize the pg_trgm module and build a GIN index with it.

CREATE EXTENSION pg_trgm; -- once per database

CREATE INDEX things_path_trgm_gin_idx ON things USING gin (path gin_trgm_ops);

Your LIKE expression can use this index even if it is not left-anchored.

See a detailed demo by depesz here.

Normalize it If you can, though.

3 Comments

Thanks. That's good to know. The write performance is dramatically slower though.
RE normalisation. This column is used in pretty much one or two places only, not sure whether the normalisation is worth it yet. Have to think more about it.
RE normalization: The normalized form will probably be dramatically faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.