0

I've read about LSH hashing and am wondering what is the best implementation to match strings within 1 character?

test = {'dog':1, 'cat': 2, 'eagle': 3} 

test['dog']
>> 1

I would want to also return 1 if I lookup test['dogs'] or test['dogg']. I realize that it would also return 1 if I were to look up "log" or "cog", but I can write a method to exclude those results.

Also how can I further this method for general strings to return a match within X characters?

string1 = "brown dogs"
string2 = "brown doggie"

Assuming only string1 is stored in my dictionary, a lookup for string2 would return string1.

Thanks

8
  • In short, you can't. Hash tables are the wrong tool for this. Commented Feb 13, 2013 at 15:32
  • That won't work, because what you're describing is not an equivalence relation. Commented Feb 13, 2013 at 15:32
  • So are you trying to get the value for a key which is the most similar to a given key? Is that correct? Commented Feb 13, 2013 at 15:41
  • @SLaks I don't know what equivalence relation has to do with that. Commented Feb 13, 2013 at 15:42
  • @freakish: The key comparison (hash function) for a hash tables must be an equivalence relation. Commented Feb 13, 2013 at 15:45

3 Answers 3

1

Well, you can define the similarity between 2 strings by the length of the start they share in common (3 for doga and dogs, for instance). This is simplistic, but that could fit your needs.

With this assumption, you can define this:

>>> test = {'dog':1, 'cat': 2, 'eagle': 3}
>>> def same_start(s1, s2):
    ret = 0
    for i in range(min(len(s1), len(s2))):
        if s1[i] != s2[i]:
            break
        ret += 1
    return ret

>>> def closest_match(s):
    return max(((k, v, same_start(k, s)) for k, v in test.iteritems()), key=lambda x: x[2])[1]

>>> closest_match('dogs')  # matches dog
1
>>> closest_match('cogs')  # matches cat
2
>>> closest_match('eaogs') # matches eagle
3
>>> 
Sign up to request clarification or add additional context in comments.

Comments

0

Maybe you could try using a Soundex function as your dictionary key?

Comments

0

Since your relation is not 1:1, maybe you could define your own dict type with redefined __getitem__ which could return a list of possible items. Here's what I mean:

class MyDict(dict):
  def __getitem__(self, key):
    l = []
    for k, v in self.items():
      if key.startswith(k): # or some other comparation method
        l.append(v)
    return l

This is just an idea, probably other dict methods should be redefined too in order to avoid possible errors or infinite loops. Also, @Emmanuel's answer could be very useful here if you want only one item returned instead of the list, and that way you wouldn't have to redefine everything.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.