I am writing a solver for a crossword puzzle that reads in a dictionary file and given a pattern returns a list of all words that fit that pattern. I have the functionality working but I need this to work faster. I create a HashMap with the length of words being the key and an ArrayList of the words as the value. Is there someway I can read through the ArrayList faster or is there some better data structure to use?
import java.util.*;
public class CWSolution {
//create the data structure that will store the dictionary
private HashMap<Integer,ArrayList<String>> db;
public CWSolution(List<String> allWords)
{
//construct the background structure
//Create hashmap
db = new HashMap<Integer,ArrayList<String>>();
//go through each word
for(String item : allWords ){
//if the db does not contain a listing for this word length, create one
if(!db.containsKey(item.length())){
ArrayList<String> temp = new ArrayList<String>();
temp.add(item);
db.put(item.length(), temp);
}
//if it does contain a listing for this word length, add this word to it
else{
ArrayList<String> temp = db.get(item.length());
temp.add(item);
db.put(item.length(), temp);
}
}
}
public List<String> solutions(String pattern, int maxRequired)
{
//actually look for each pattern
//create the structures we need
List<String> answer = new ArrayList<String>();
//get the relevant array list
ArrayList<String> temp = db.get(pattern.length());
//go through the array list word by word
for(String item : temp ){
//see if the pattern matches the word, if it does add it to the list, otherwise skip it
if(matchPattern(pattern, item)){
answer.add(item);
}
//if we reach the required size return it, otherwise keep going
if(answer.size() == maxRequired){
return answer;
}
}
return answer;
}
private boolean matchPattern(String pattern, String word){
//TODO implement this function
//check the word against the pattern
char star = "*".charAt(0);
for(int i=0;i<pattern.length();i++){
if(pattern.charAt(i) != star){
if(pattern.charAt(i) != word.charAt(i)){
return false;
}
}
}
return true;
}
}
EDIT: Adding some more information to make this more clear.
Some of the comments were debating this so figured I'd clarify, I am a student in a Data Structures course so there is only so much that I know about, but we are nearing the end of the semester so I have a good idea of basic data structures.
Furthermore I am not as concerned about optimizing the CWSolution() method a I am with optimizing the solutions() method. The speed is being tested as follows and what I am really concerned with is Time2. That is how long it takes to find matching words rather than how long it takes to create the structure.
import java.util.Date;
import java.util.List;
public class CWSpeedTest {
public static void main(String[] args){
try{
FileParser fp = new FileParser("TWL06.txt");
List<String> solutions = null;
//Change this to change the pattern
String pattern = "*S**";
//Change this to change the max solutions
int maxSolns = 2000;
List<String> dict = fp.getAllWords();
Date d1 = new Date();
CWSolution c = new CWSolution(dict);
Date d2 = new Date();
for (int i = 0; i < 1000; i++)
solutions = c.solutions(pattern,maxSolns);
Date d3 = new Date();
System.out.println("Time 1: " + (d2.getTime() - d1.getTime()));
System.out.println("Time 2: " + (d3.getTime() - d2.getTime()));
System.out.println("For the pattern: " + pattern);
System.out.println("With max solutions: " + maxSolns);
System.out.println(solutions);
}catch (Exception e){
e.printStackTrace();
}
}
}
Map<Integer, Map<Character, Set<String>>>- theSetis all the words that have the givenCharacterat the position given by theInteger. You then go over characters in the pattern, get theSets from this map for all the specified characters, and successively do intersections of these sets.Map<Integer, Map<Integer, Map<Character, Set<String>>>>because that amount of angle brackets is painful. What you're really looking for is a multidimensional matrix. Alternatively, you could create a separate classDictionaryKeyfor the map key with the fieldslength,position, andcharacter. Have your IDE generate ahashCode()using those fields, and then use aMap<DictionaryKey, Set<String>>to make the code more readable. This way you could microoptimise a bit by makingDictionaryKeyimmutable and memoising the hash code.length -> position -> character -> wordsis calleddb. You would store the word "puppy" under the length5, and theposition, characterkeys0, 'p',1, 'u',2, 'p'etc. (Each word would be indexed by each character in the word, not just the starting word. That's precisely why the key isposition, character.) Then if you're looking for words that match**ppy, you get the setsdb[2]['p'],db[3]['p'], anddb[4]['y']. I'd code it up but I'm busy ATM, and if this is homework doing so would be a great exercise for the OP ;)total_words/10words. (Assuming that most of the words have a length between3and13. The exact length distribution doesn't actually matter since my algorithm filters by word length too.) Then it performsword_lengthcharacter comparisons for each of those words. This is possibly a large number if the dictionary is big.Nis the number of words in the dictionary, andmis the length of the current word, the OP's approach isO((N * m) / 10). Mine isO(N / 260)and some change. Seeing howmis likely to be modest it's the same complexity class, but still at least an order of magnitude faster.