Thanks to BlackBear for pointing out that my timings were skewed
because of the re-computation of loop invariants. On moving them out, things change, drastically.
There are two ways of doing this. The sane way, and the regex way. First, the setup.
string = "My name is Andrew, I am pretty awesome"
choices = [['andrew', 'name', 'awesome'], ['andrew', 'designation', 'awesome']]
Option 1
This one performs an in substring check inside a list comprehension. The in check runs on a modified implementation of the Boyer-Moore algorithm in C, and is very fast.
>>> [c for c in choices if all(y in string.lower() for y in c)]
[['andrew', 'name', 'awesome']]
And now, for the timings. But first, a minor performance nitpick; you can cache the value of string.lower() outside the loop, it's an invariant and doesn't need to be re-computed each time -
v = string.lower()
%timeit [c for c in choices if all(y in v for y in c)]
1000000 loops, best of 3: 2.05 µs per loop
Option 2
This one uses re.split + set.issuperset;
>>> import re
>>> [c for c in choices if set(re.split('\W', string.lower())).issuperset(c)]
[['andrew', 'name', 'awesome']]
The use of re.split cannot be avoided, if you want to perform set checks, because of punctuation in your sentences.
Again, the set computation is a loop invariant, and can be moved out. This is how it does -
v = set(re.split('\W', string.lower()))
%timeit [c for c in choices if v.issuperset(c)]
1000000 loops, best of 3: 1.13 µs per loop
This is an exceptional case where I find regular expressions performing marginally faster. However, these timings are not conclusive, because they vastly differ by the data's size and structure. I'd recommend trying things out with your own data before drawing any conclusions, although my gut feeling is that the regex solution would scale poorly.
myString = set('My name is Andrew, I am pretty awesome'.split())resultswith ans. And btw you are returning a list of lists[['andrew', 'name', 'awesome']]printstatement inside theforloop.