Effectively query on column that includes a substring

Question

Given a string column with a value similar to /123/12/34/56/5/, what is the optimal way of querying for all the records that include the given number (12 for example)?

The solution from top of my head is:

SELECT id FROM things WHERE things.path LIKE '%/12/%'

But AFAIK this query can't use indexes on the column due to the leading %.

There must be something better. What is it?

Using PostgreSQL, but would prefer the solution that would work across other DBs too.

I can go with additional table 'thing_paths' with path in it. Then join it and query it ~ select DISTINCT things.id from things inner join thing_paths on thing_path.thing_id = things.id WHERE thing.path LIKE '/12/%'. But at this stage it is more than I ideally want to do. — Dmytrii Nagirniak
– Dmytrii Nagirniak, Commented Apr 16, 2012 at 3:56

Edmund · Accepted Answer · 2012-04-16 05:06:39Z

4

If you're happy turning that column into an array of integers, like:

'/123/12/34/56/5/' becomes ARRAY[123,12,34,56,5]

So that path_arr is a column of type INTEGER[], then you can create a GIN index on that column:

CREATE INDEX ON things USING gin(path_arr);

A query for all items containing 12 then becomes:

SELECT * FROM things WHERE ARRAY[12] <@ path_arr;

Which will use the index. In my test (with a million rows), I get plans like:

EXPLAIN SELECT * FROM things WHERE ARRAY[12]  <@ path_arr;
                                      QUERY PLAN
----------------------------------------------------------------------------------------
 Bitmap Heap Scan on things  (cost=5915.75..9216.99 rows=1000 width=92)
   Recheck Cond: (path_arr <@ '{12}'::integer[])
   ->  Bitmap Index Scan on things_path_arr_idx  (cost=0.00..5915.50 rows=1000 width=0)
         Index Cond: ('{12}'::integer[] <@ path_arr)
(4 rows)

edited Apr 16, 2012 at 5:06

answered Apr 16, 2012 at 4:59

Edmund

10.8k3 gold badges42 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Dmytrii Nagirniak Over a year ago

Thanks. That's another option. But AFAIK it is technically the same as LIKE + GIN index as Erwin suggested, just expressed differently.

Edmund Over a year ago

It is another GIN index. But it's one that more closely corresponds to the actual data. In principle the index will be more efficient because it won't be recording items containing "27/" and "1/2", etc.

Dmytrii Nagirniak Over a year ago

Edmund, I accepted Erwin's answer simply because it was simpler (didn't have to change any code yet). But I wish I could accept 2 answers since your solution is more elegant.

Edmund Over a year ago

There is often a tradeoff between pragmatism and elegance. And the syntax for indexes over lists doesn't help, so I might have picked Erwin's answer too (tbh). Thanks for your comment.

Erwin Brandstetter · Accepted Answer · 2013-05-04 18:04:56Z

3

In PostgreSQL 9.1 you could utilize the pg_trgm module and build a GIN index with it.

CREATE EXTENSION pg_trgm; -- once per database

CREATE INDEX things_path_trgm_gin_idx ON things USING gin (path gin_trgm_ops);

Your LIKE expression can use this index even if it is not left-anchored.

See a detailed demo by depesz here.

Normalize it If you can, though.

edited May 4, 2013 at 18:04

answered Apr 16, 2012 at 3:57

Erwin Brandstetter

669k160 gold badges1.2k silver badges1.3k bronze badges

3 Comments

Dmytrii Nagirniak Over a year ago

Thanks. That's good to know. The write performance is dramatically slower though.

Dmytrii Nagirniak Over a year ago

RE normalisation. This column is used in pretty much one or two places only, not sure whether the normalisation is worth it yet. Have to think more about it.

kgrittn Over a year ago

RE normalization: The normalized form will probably be dramatically faster.

Collectives™ on Stack Overflow

Effectively query on column that includes a substring

2 Answers 2

4 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related