I have been using Similarity function of pg_trgm module in PostgreSQL and now i am searching for a words similarity function similar to Similarity in Python. I have found many methods in python e.g. difflib, nltk, but none of these methods produces results similar to that of Similarity function of PostgreSQL.
I have been using this code for words matching but the results are very different from those of PostgreSQL similarity function. Are these results better than those of Similarity function of PostgreSQL? Is there any method or library that i can use to produce the results similar to PostgreSQL Similarity function?
from difflib import SequenceMatcher
import nltk
from fuzzywuzzy import fuzz
def similar(a,b):
return SequenceMatcher(None,a,b).ratio()
def longest_common_substring(s1, s2):
m = [[0] * (1 + len(s2)) for i in xrange(1 + len(s1))]
longest, x_longest = 0, 0
for x in xrange(1, 1 + len(s1)):
for y in xrange(1, 1 + len(s2)):
if s1[x - 1] == s2[y - 1]:
m[x][y] = m[x - 1][y - 1] + 1
if m[x][y] > longest:
longest = m[x][y]
x_longest = x
else:
m[x][y] = 0
return s1[x_longest - longest: x_longest]
def similarity(s1, s2):
return 2. * len(longest_common_substring(s1, s2)) / (len(s1) + len(s2)) * 100
print similarity("New Highway Classic Academy Lahore","Old Highway Classic Academy")
print nltk.edit_distance("This is Your Shop","This")
print fuzz.ratio("ISE-Tower","UfTowerong,")