Here's a the problem, provided a list of strings and a document find the shortest substring that contains all the strings in the list.
Thus for:
document = "many google employees can program because google is a technology company that can program"
searchTerms = ['google', 'program', 'can']
the output should be:
"can program because google" # 27 chars
and not:
"google employees can program" # 29 chars
"google is a technology company that can program" # 48 chars
Here's my approach, Split the document into suffix tree, check for all strings in each suffix return the one of the shortest length,
Here's my code
def snippetSearch(document, searchTerms):
doc = document.split()
suffix_array = create_suffix_array(doc)
current = None
current_len = sys.maxsize
for suffix in suffix_array:
if check_for_terms_in_array(suffix, searchTerms):
if len(suffix) < current_len:
current_len = len(suffix)
current = suffix
return ' '.join(map(str, current))
def create_suffix_array(document):
suffix_array = []
for i in range(len(document)):
sub = document[i:]
suffix_array.append(sub)
return suffix_array
def check_for_terms_in_array(arr, terms):
for term in terms:
if term not in arr:
return False
return True
This is an online submission and it's not passing one test case. I have no idea what the test case is though. My question is, is there anything logically incorrect with the code. Also is there a more efficient way of doing this.
uncanned raviolicontaincan;?