2

I have a forum written in PHP using MySQL, and I'd like to make forum search available. It will allow users to search for particular strings, as well as filter on metadata like post date and subject and so on. The metadata can be efficiently searched because most of these fields are indexed, but I think that the primary use-case is of course going to be normal text search, and without making use of metadata filters which could trim the results.

After some testing I have found that, contrary to most people's setups, SQL_CALC_FOUND_ROWS is significantly faster (approx 1.5x) than doing the query twice in order to get the number of results, so the best query I have is:

SQL_CALC_FOUND_ROWS * from blahblah where content like '%term%' limit whatever whatever;

Unsurprisingly, this is really slow because it has to text-match every single forum post in the database. Is there anything I can do to improve on this? Would putting an index on the content (TEXT) field even help when using the LIKE operator? How does one normally do this?

1

1 Answer 1

6

An index on the column will help, even using the like operator, but not when you have a wildcard at the start too. So for term% an index will be beneficial, but for %term% it will not.

But instead, you may have a look at FULLTEXT indexes. If you add such an index to a TEXT field, MySQL indexes separate words and allows you to do all kinds of search engine-like searches. To search you use MATCH() ... AGAINST instead of LIKE.

See the docs: https://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

Disclaimer: I suggest you read the documentation carefully after the first experimentation. FULLTEXT indexes are powerful but still have their limits.

FULLTEXT indexes take up quite some space, and the way they are built up depends on core settings in MySQL, so they may behave differently between a local setup and a server.

For instance, they index complete words but leave out very short words and certain stop-words. Also, because they index words, you won't be able to search parts of words. Looking for 'term' will not find 'determine' out of the box.

So make sure those indexes can do what you want, and if you have a shared hosting, make sure they can be configured and tuned the way you like before you do a large implementation.

Sign up to request clarification or add additional context in comments.

1 Comment

Ah thanks, I had not seen this before :) I just figured there had to be something better than LIKE '%term%'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.