3

I have a table in a postgres 8.1 db with information about approximately 370,000 customers. This table includes the fields sn (surname) and gn (given name). I would like to enable users to search for customers full names using the form or or simply . My first attempt to construct a query was like this:

SELECT sn || ', ' || gn as name from users 
WHERE  sn || ' ' || gn like '%Johnson David%'
or   gn || ' ' || sn like '%Johnson David%' 

This worked fine, but was quite slow, clocking in at 600/623 ms. In order to optimise, I created an index on the sn field only, as I guessed that the gn field would contain so much duplication as to be useless for indexing. Unfortunately, indexing surname didn't improve performance at all as the query didn't use the index.

Seq Scan on users (cost=0.00..18296.06 rows=1 width=64) (actual time=57.935..588.755 rows=8 loops=1)

My guess is that the reason for this is that described in this thread. I considered using a multicolumn index, but I guessed that it would mean that I could only search in one of the two styles I mention above, i.e. or but not both.

I have also considered creating a full text index, but it seems unsuitable for name values, as I would get a lot of stemming and so on that isn't relevant. Does anyone have any suggestions for indexing strategies? It seems like it should be quite a common use case.

2 Answers 2

5

It won't use an index since you are using the wildcard at the beginning of the string %..... This won't work. Consider using trigrams. Alternatively, you can use full text search features. Both these methods will require a newer version of Postgres. You should update anyways. 8.1 is stone-age old, not supported and newer versions will not only by faster but also give you more features to work with what you want.

Sign up to request clarification or add additional context in comments.

Comments

0

Create an index on the full computed expression. This will still force an index scan, but the expression is precomputed and the index is much smaller than the whole table.

4 Comments

Can you give a syntax example? I'm not clear on what you mean by the full computed expression.
Sry about being too vague. Create an index on computed columns for (sn || ' ' || gn, gn || ' ' || sn). That index will still not allow for seeks, but it can be used for faster scans. If the perf is good enough this is a simple solution.
hm, for some reason that gives me a syntax error ERROR: syntax error at or near "||". Perhaps postgres 8.1 doesn't accept concatenation in create index? Would it work to do a CREATE INDEX on users(sn, gn)?
I'm not that much into postgres but I know it has the ability to index computed columns. Maybe you need to literally create a column first?! The docs will probably tell. Or, create 2 normal columns and keep them updated manually.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.