indexes for large database in postgres

Question

I have a table in postgres having ~2 million records. I need to provide some index on it such that it gives good performance for like %text% queries.

I read somewhere that Gin indexes are good for %text% searches and so tried Gin and Gist indexes, but don't know why there is no such performance improvement, and Gin index is using sequential scan instead of heap scan.

Here's my Gin index :

CREATE INDEX city_gin_idx_name
  ON city
  USING gin
  (to_tsvector('english'::regconfig, lower(name::text)));

Query performance:

"Sort (cost=117553.00..118496.71 rows=377482 width=50) (actual time=1719.660..1745.702 rows=35185 loops=1)" " Sort Key: (concat(name, ', ', state_name, ', ', country_name))" " Sort Method: external merge Disk: 2200kB" " -> Seq Scan on city (cost=0.00..56777.75 rows=377482 width=50) (actual time=0.392..1474.559 rows=35185 loops=1)" " Filter: ((lower((name)::text) ~~ '%ed%'::text) OR ((city_metaphone)::text = 'K'::text))" " Rows Removed by Filter: 1851806" "Total runtime: 1764.036 ms"

Please tell me any suitable index for this requirement.

Can you please properly format the execution plan (preserving indention)? (e.g. using <pre> tags) As it is it is not really readable. Or upload it to explain.depesz.com — user330315
– user330315, Commented Feb 13, 2014 at 9:23

Denis de Bernardy · Accepted Answer · 2014-02-13 09:34:12Z

1

You need two indexes for that query, and you need to use the exact same expressions in your query to use them:

create index … on city using GIN (to_tsvector('english', name));
create index … on city (city_metaphone);

Note that lowercasing the name is useless in the first index, since to_tsvector will ignore the case anyway when computing vectors.

The query then needs to look like this, and you should get a plan that uses bitmap index scans:

select *
from city
where city_metaphone = 'K'
   or to_tsvector('english', name) @@ to_tsquery('english', 'Katmandu');

That being said, I think your use of full text here is erroneous. Your '%ed%', in particular, indicates that you're hoping that full text will let you run some kind of LIKE comparison.

That is not how it works out of the box, but trigrams will make it work that way:

http://www.postgresql.org/docs/current/static/pgtrgm.html

answered Feb 13, 2014 at 9:34

Denis de Bernardy

79.1k14 gold badges138 silver badges158 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3273881 Over a year ago

Thanks for the response. I tried using trigrams and performance also improved to a great extent. The query execution time has now reduced to ~300ms from 1500ms. But still I need to improve it. Is this the best we can do for this scenario or can we optimise it further? This is the index I used: CREATE INDEX trgm_idx ON city USING gin (name gin_trgm_ops); ANd my query: SELECT * FROM "city" WHERE name % 'ed' OR city_metaphone = 'K'

Denis de Bernardy Over a year ago

Tried "F.30.4. Text Search Integration" on the pgtrgm page?

user3273881 Over a year ago

According what I understood from Text Search Integration is that we can use it for full text searches or minor spelling mistakes. I need the index for providing auto-complete where I will be having only 2-3 charcters and need to search matching words for that. I am not sure whether it will be useful for this scenario. Please correct me if I am wrong.

Collectives™ on Stack Overflow

indexes for large database in postgres

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related