2

I have a table in postgres having ~2 million records. I need to provide some index on it such that it gives good performance for like %text% queries.

I read somewhere that Gin indexes are good for %text% searches and so tried Gin and Gist indexes, but don't know why there is no such performance improvement, and Gin index is using sequential scan instead of heap scan.

Here's my Gin index :

CREATE INDEX city_gin_idx_name
  ON city
  USING gin
  (to_tsvector('english'::regconfig, lower(name::text)));

Query performance:

"Sort (cost=117553.00..118496.71 rows=377482 width=50) (actual time=1719.660..1745.702 rows=35185 loops=1)" " Sort Key: (concat(name, ', ', state_name, ', ', country_name))" " Sort Method: external merge Disk: 2200kB" " -> Seq Scan on city (cost=0.00..56777.75 rows=377482 width=50) (actual time=0.392..1474.559 rows=35185 loops=1)" " Filter: ((lower((name)::text) ~~ '%ed%'::text) OR ((city_metaphone)::text = 'K'::text))" " Rows Removed by Filter: 1851806" "Total runtime: 1764.036 ms"

Please tell me any suitable index for this requirement.

1
  • 2
    Can you please properly format the execution plan (preserving indention)? (e.g. using <pre> tags) As it is it is not really readable. Or upload it to explain.depesz.com Commented Feb 13, 2014 at 9:23

1 Answer 1

1

You need two indexes for that query, and you need to use the exact same expressions in your query to use them:

create index … on city using GIN (to_tsvector('english', name));
create index … on city (city_metaphone);

Note that lowercasing the name is useless in the first index, since to_tsvector will ignore the case anyway when computing vectors.

The query then needs to look like this, and you should get a plan that uses bitmap index scans:

select *
from city
where city_metaphone = 'K'
   or to_tsvector('english', name) @@ to_tsquery('english', 'Katmandu');

That being said, I think your use of full text here is erroneous. Your '%ed%', in particular, indicates that you're hoping that full text will let you run some kind of LIKE comparison.

That is not how it works out of the box, but trigrams will make it work that way:

http://www.postgresql.org/docs/current/static/pgtrgm.html

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the response. I tried using trigrams and performance also improved to a great extent. The query execution time has now reduced to ~300ms from 1500ms. But still I need to improve it. Is this the best we can do for this scenario or can we optimise it further? This is the index I used: CREATE INDEX trgm_idx ON city USING gin (name gin_trgm_ops); ANd my query: SELECT * FROM "city" WHERE name % 'ed' OR city_metaphone = 'K'
Tried "F.30.4. Text Search Integration" on the pgtrgm page?
According what I understood from Text Search Integration is that we can use it for full text searches or minor spelling mistakes. I need the index for providing auto-complete where I will be having only 2-3 charcters and need to search matching words for that. I am not sure whether it will be useful for this scenario. Please correct me if I am wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.