I try to incorporate fuzzy serach function in a django project without using Elasticsearch.
1- I am using postgres, so I first tried levenshtein, but it did not work for my purpose.
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
items = Product.objects.annotate(lev_dist=Levenshtein(F('sort_name'), searchterm)).filter(
lev_dist__lte=2
)
Search "glyoxl" did not pick up "4-Methylphenylglyoxal hydrate", because levenshtein considered "Methylphenylglyoxal" as a word and compared with my searchterm "glyoxl".
2. trigram_similar gave weird results and was slow
items = Product.objects.filter(sort_name__trigram_similar=searchterm)
"phnylglyoxal" did not pick up "4-Methylphenylglyoxal hydrate", but picked up some other similar terms: "4-Hydroxyphenylglyoxal hydrate", "2,4,6-Trimethylphenylglyoxal hydrate"
"glyoxl" did not pick up any of the above terms
3. python package, fuzzywuzzy seems can solve my problem, but I was not able to incorporate it into query function.
ratio= fuzz.partial_ratio('glyoxl', '4-Methylphenylglyoxal hydrate')
# ratio = 83
I tried to use fuzz.partial_ratio function in annotate, but it did not work.
items = Product.objects.annotate(ratio=fuzz.partial_ratio(searchterm, 'full_name')).filter(
ratio__gte=75
)
Here is the error message:
QuerySet.annotate() received non-expression(s): 12.
According to this stackoverflow post (1), annotate does not take regular python functions. The post also mentioned that from Django 2.1, one can subclass Func to generate a custom function. But it seems that Func can only take database functions such as levenshtein.
Any way to solve these problems? thanks!
partial, since the annotations do not run at the Django/Python level. What the Django ORM does is construct a database query. A function likeLevenshteinis thus only some mechanism to writelevenshtein(sort_name), etc. in the query, not evaluate it at the Django/Python level.