My friend came across this problem in an interview . We are given a file having some sentences. The sentences have only 0-9, a-z, A-Z and full-stop(.) , We need to read the file and store it in a manner that querying is faster. Time taken by this phase is not a concern. Here query will consist of some words, and we need to return the smallest sub-string having all these words. Order is not important. (Note: Assuming whole file can fit in main memory)
For example if the file is: "Ram was doing a computer science degree. Ram has a computer at home. Ram is now at home."
Query 1 :"Ram computer a" Output: "Ram has a computer" Query 2: "Ram home" Response: "home. Ram"
I thought of storing the file as a Link-List where each node consists of a word. If is a last word then word+fullstop is stored in node. At query time, we need to traverse the LL and find minimum string having all the words.
How can we optimize it further ? Can we store the file in a better way ?