1

Current Situation:

I am currently running a keyword search using multiple keywords in PHP and SQL. The field I'm applying the search to is the title field, which is a 250 VARCHAR field.

A user can input a single keyword, e.g. "apple" or also multiple, e.g. "apple banana yellow". The first option is trivial. For the second option, my current algorithm works like this:

  1. Try and find items that match the exact entire string "apple banana yellow" in the title. Order the results by index id.
  2. If no more results matching the exact entire string are found, or if none are found in the first place, search for all titles containing either "apple", "banana", or "yellow". Order the results by index id.

The algorithm is very basic but funny enough works pretty well.


What I'm looking for:

However I am now looking to implement a smarter search algorithm without having to rely on external paid scripts like Amazon services. I'm looking for a way to implement the following:

  • fuzzy search (I've read about SOUNDEX or levenshtein which may realize this)
  • smarter keyword search (Don't just either return items that match ALL words or JUST A SINGLE WORD, but maybe also 2 words or 3 words before)
  • order by relevance/likeness (Order by likeness of the search to the title, and not just the index id)
  • (Bonus: maybe even implement search for exact strings, like using " " on google to find exactly the words between the quotation marks)

What is the best way to get started with such a search? I am using InnoDB for MySQL.

4
  • What SQL system are you hitting? Are you using MySQL, ORacle? Commented Jan 25, 2017 at 21:26
  • @Ray sorry, I am using InnoDB for MySQL. I'll edit my post. Commented Jan 25, 2017 at 21:30
  • 1
    You might be interested in sphinxsearch.com which can sit on top of mysql Commented Jan 25, 2017 at 21:41
  • 1
    @Mihai second that. If you need to grow past some basic searches or find yourself adding many fulltext indexes, a more targeted search technology like sphinx, solr, or elasticsearch may be the right fit for your needs Commented Jan 25, 2017 at 21:59

1 Answer 1

3

Assuming MySQL, you can add a FULL Text index. Then, there are a number of functions that will allow you to so basic searches that meet all the needs you list: https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html

You end up using syntax like:

 SELECT * FROM table_name WHERE MATCH(column_with_fulltext_index_on_it)
      AGAINST('apple banana yellow' IN NATURAL LANGUAGE MODE)

To see the match score

 SELECT column_with_fulltext_index_on_it, MATCH(column_with_fulltext_index_on_it)
      AGAINST('apple banana yellow' IN NATURAL LANGUAGE MODE) AS score  FROM table_name WHERE MATCH(column_with_fulltext_index_on_it)
      AGAINST('apple banana yellow' IN NATURAL LANGUAGE MODE)

It can be a little learning curve to overcome to understand how you can tweak the match clause perfect for your needs, but your examples seem pretty basic though (except the smarter search).

Also, good to note, there are system configs you need to control the the min/max characters of words/tokens to index by. You can read https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html to get deeper understanding of indexing options. Percona is a good resource as well https://www.percona.com/blog/2013/02/26/myisam-vs-innodb-full-text-search-in-mysql-5-6-part-1/ (typically more human digestible than the MySQL Doc's).

If you need to do more complex searches, you can look at adding other technologies like Solr, but I've always recommended, get the basic working with what you got, only adopt a new tech if you hit a brick wall, or have good metric on existing solution and know the new tech will somehow improve (speed, storage space, quality of results, etc...). If you can't quantify, stick to basic until you can.

Here's a good tutorial: http://www.w3resource.com/mysql/mysql-full-text-search-functions.php

Sign up to request clarification or add additional context in comments.

5 Comments

Hi, thank you this looks very promising! What would you say are the main advantages of using Solr instead of FULL Text Indexes?
There are probably too many to advantages for certain scenarios to list here. Solr is a datastore incorperating Lucine, which is built around searching. Mysql is a relational DB with some search features, but if they meet your needs, prevent you from needing to adopt a 2nd technology.
Amazon Elastisearch also implements Lucene, so you could use that as well if you don't want to maintain your own Solr system.
Thank you for your excellent answer, I think I will try and get it working using Mysql Full Text Search first!
@JonasKaufmann cool, I added a blurb about some config settings you might need to deal with for your index.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.