4

Can someone help me out with some suggestion for a full-text searching engine that supports Python?

Right now we have a MySQL database in place and I'd like to add the ability to have a full-text search engine index some of the text in some of the tables in this database. This text data would be used by a web application to search for the corresponding records in the database. For instance, index the customer name information in our customer table, full text search that with the web application to get the MySQL record for the customer.

I've looked (briefly) at Lucene, Swish-E and MongoDB, and few others, but I'm not sure what would be a good choice for me considering a couple of things:

  • I'm not a Java guy (though I've been programming for a long time),
  • we only want to search a relatively small set of data,
  • we're looking to index text in a MySQL database,
  • and would like that index to be updated in semi-realtime.

Any hints, tips or pointers would be greatly appreciated!

4
  • have you looked at dev.mysql.com/doc/refman/5.0/en/fulltext-search.html btw? Commented Feb 8, 2012 at 9:17
  • I have looked at MySQL full text searching, but that won't work for us as all our tables are in INNODB format, which doesn't support full text search. Commented Feb 8, 2012 at 17:37
  • How about storing the text in a separate MyISAM table? Commented Feb 8, 2012 at 23:08
  • Could do that, but I've used MySQL's full text search before and found it to be only so-so performance wise. Plus I'm inclined to move this functionality out of database server (which is already plenty busy) and into another server. Commented Mar 5, 2012 at 20:59

3 Answers 3

4

Have a look at Whoosh. I've heard it doesn't scale up terribly well (maybe that's fixed now) but for small collections, it might be useful.

For a scalable solution, consider using Lucene with PyLucene or Jython.

Sign up to request clarification or add additional context in comments.

2 Comments

Agree with starting out with Whoosh. I use it for my bookmark app (Bookie) and it's doing full web pages for 20k ish web pages without an issue. From what I can gather, whoosh is good until you hit hundreds of thousands of documents, and after that it's time to check our solr and lucene.
I haven't heard of Whoosh, but will take a look at it, thanks!
0

Building pylucene a few months ago was one of the most painful experiences I had. The project won't get any traction IMHO if it's so hard to build.

With a few other folks having the same itch to scratch, we started https://code.google.com/a/apache-extras.org/p/pylucene-extra/ to gather prebuilt pylucene and jcc eggs on several operating systems, Python versions and Java runtimes combos. It is not very active lately, though.

Whoosh might be a good fit, or you may want to have a look at Sphinx, ElasticSearch or HaystackSearch (CAVEAT: I did not work on any of these).

Or maybe try to access Solr via python (there are a few APIs), which might be much easier than using pylucene. Consider that lucene will still need a JVM to run, of course.

Since you don't have huge scalability needs, I would focus on simple usage and community support rather than performance and scale. Hope it helps.

1 Comment

That's what I'm concerned about with pylucene, I'm not a Java guy so don't know anything about the build process. I'll take a look at the other ones you mention, thanks!
0

Solr is a great wrapper to Lucene, it greatly simplifies things. It doesn't require any Java tinkering for most things, you just need to configure some XML files. It does run as another process, so this may complicate your deployment.

I have had great results with pysolr, but really, you could write your own python communication library since Solr uses REST, so it is really simple to send and retrieve data in either xml or json.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.