6

I don't want any codes. I really want to learn the logic myself but I need pointing to the right direction. Pseudocode is fine. I basically need to create a spell checker using hash tables as my primary data structure. I know it may not be the best data structure for the job but that it what i was tasked to do. The words with correct spellings will come from a text file. Please guide me on how to approach the problem.

The way I'm thinking of doing it:

  1. I'm guessing I need to create a ADT class that takes the string words.

  2. I need a main class that reads the dictionary text file and takes a sentence inputted by a user. This class then scans that string of words then places each word into an ArrayList by noting the spaces in between the words. A boolean method will then pass each word in the Arraylist to the class that will handle misspellings and return if the word is valid or false.

  3. I believe I need to create a class that generates the misspellings from the word list and stores them into the hash table? There will be a boolean method that takes a string parameter that checks in the table if the word is valid and return true or false.

In generating the misspellings, the key concepts I will have to look out for will be: (Take for example the word: "Hello")

  1. Missing characters. E.g. "Ello", "Helo"
  2. Jumbled version of the word. E.g. "ehllo", "helol"
  3. Phonetic misspelling. E.g. "fello" ('f' for 'h')

How can I improve on this thinking?

EDIT! This is what I came up with using HashSet

/**
 * The main program that loads the correct dictionary spellings 
 * and takes input to be analyzed from user.
 * @author Catherine Austria
 */
public class SpellChecker {
    private static String stringInput; // input to check;
    private static String[] checkThis; // the stringInput turned array of words to check.
    public static HashSet dictionary; // the dictionary used

    /**
     * Main method.
     * @param args Argh!
     */
    public static void main(String[] args) {
        setup();
    }//end of main
    /**
     * This method loads the dictionary and initiates the checks for errors in a scanned input.
     */
    public static void setup(){
        int tableSIZE=59000;
        dictionary = new HashSet(tableSIZE);
        try {
            //System.out.print(System.getProperty("user.dir"));//just to find user's working directory;
            // I combined FileReader into the BufferReader statement
            //the file is located in edu.frostburg.cosc310
            BufferedReader bufferedReader = new BufferedReader(new FileReader("./dictionary.txt"));
            String line = null; // notes one line at a time
            while((line = bufferedReader.readLine()) != null) {
                dictionary.add(line);//add dictinary word in
            }
            prompt();
            bufferedReader.close(); //close file        
        }
        catch(FileNotFoundException ex) {
            ex.printStackTrace();//print error             
        }
        catch(IOException ex) {
            ex.printStackTrace();//print error
        }
    }//end of setUp
    /**
     * Just a prompt for auto generated tests or manual input test.
     */
    public static void prompt(){
        System.out.println("Type a number from below: ");
        System.out.println("1. Auto Generate Test\t2.Manual Input\t3.Exit");
        Scanner theLine = new Scanner(System.in);
        int choice = theLine.nextInt(); // for manual input
        if(choice==1) autoTest();
        else if(choice==2) startwInput();
        else if (choice==3) System.exit(0);
        else System.out.println("Invalid Input. Exiting.");
    }
    /**
     * Manual input of sentence or words.
     */
    public static void startwInput(){
        //printDictionary(bufferedReader); // print dictionary
        System.out.println("Spell Checker by C. Austria\nPlease enter text to check: ");
        Scanner theLine = new Scanner(System.in);
        stringInput = theLine.nextLine(); // for manual input
        System.out.print("\nYou have entered this text: "+stringInput+"\nInitiating Check..."); 
        /*------------------------------------------------------------------------------------------------------------*/
        //final long startTime = System.currentTimeMillis(); //speed test
        WordFinder grammarNazi = new WordFinder(); //instance of MisSpell
        splitString(removePunctuation(stringInput));//turn String line to String[]
        grammarNazi.initialCheck(checkThis);
        //final long endTime = System.currentTimeMillis();
        //System.out.println("Total execution time: " + (endTime - startTime) );
    }//end of startwInput
    /**
     * Generates a testing case.
     */
    public static void autoTest(){
        System.out.println("Spell Checker by C. Austria\nThis sentence is being tested:\nThe dog foud my hom. And m ct hisse xdgfchv!@# ");
        WordFinder grammarNazi = new WordFinder(); //instance of MisSpell
        splitString(removePunctuation("The dog foud my hom. And m ct hisse xdgfchv!@# "));//turn String line to String[]
        grammarNazi.initialCheck(checkThis);
    }//end of autoTest

    /**
     * This method prints the entire dictionary. 
     * Was used in testing.
     * @param bufferedReader the dictionary file
     */
    public static void printDictionary(BufferedReader bufferedReader){
        String line = null; // notes one line at a time
        try{
            while((line = bufferedReader.readLine()) != null) {
                System.out.println(line);
            }
        }catch(FileNotFoundException ex) {
            ex.printStackTrace();//print error             
        }
        catch(IOException ex) {
            ex.printStackTrace();//print error
        }
    }//end of printDictionary

    /**
     * This methods splits the passed String and puts them into a String[]
     * @param sentence The sentence that needs editing.
     */
    public static void splitString(String sentence){
        // split the sentence in between " " aka spaces
        checkThis = sentence.split(" ");
    }//end of splitString

    /**
     * This method removes the punctuation and capitalization from a string.
     * @param sentence The sentence that needs editing.
     * @return the edited sentence.
     */
    public static String removePunctuation(String sentence){
        String newSentence; // the new sentence
        //remove evil punctuation and convert the whole line to lowercase
        newSentence = sentence.toLowerCase().replaceAll("[^a-zA-Z\\s]", "").replaceAll("\\s+", " ");
        return newSentence;
    }//end of removePunctuation
}

This class checks for misspellings

public class WordFinder extends SpellChecker{
    private int wordsLength;//length of String[] to check
    private List<String> wrongWords = new ArrayList<String>();//stores incorrect words

    /**
     * This methods checks the String[] for spelling errors. 
     * Hashes each index in the String[] to see if it is in the dictionary HashSet
     * @param words String list of misspelled words to check
     */
    public void initialCheck(String[] words){
        wordsLength=words.length;

        System.out.println();
        for(int i=0;i<wordsLength;i++){
            //System.out.println("What I'm checking: "+words[i]); //test only
            if(!dictionary.contains(words[i])) wrongWords.add(words[i]);
        } //end for
        //manualWordLookup(); //for testing dictionary only
        if (!wrongWords.isEmpty()) {
            System.out.println("Mistakes have been made!");
            printIncorrect();
        } //end if
        if (wrongWords.isEmpty()) {
            System.out.println("\n\nMove along. End of Program.");
        } //end if
    }//end of initialCheck

    /**
     * This method that prints the incorrect words in a String[] being checked and generates suggestions.
     */
    public void printIncorrect(){//delete this guy
        System.out.print("These words [ ");
        for (String wrongWord : wrongWords) {
            System.out.print(wrongWord + " ");
        }//end of for
        System.out.println("]seems incorrect.\n");
        suggest();
    }//end of printIncorrect

    /**
     * This method gives suggestions to the user based on the wrong words she/he misspelled.
     */
    public void suggest(){
        MisSpell test = new MisSpell();
        while(!wrongWords.isEmpty()&&test.possibilities.size()<=5){
            String wordCheck=wrongWords.remove(0);
            test.generateMispellings(wordCheck);
            //if the possibilities size is greater than 0 then print suggestions
            if(test.possibilities.size()>=0) test.print(test.possibilities);
        }//end of while
    }//end of suggest

    /*ENTERING TEST ZONE*/
    /**
     * This allows a tester to look thorough the dictionary for words if they are valid; and for testing only.
     */
    public void manualWordLookup(){
        System.out.print("Enter 'ext' to exit.\n\n");
        Scanner line = new Scanner(System.in);
        String look=line.nextLine();
        do{
        if(dictionary.contains(look)) System.out.print(look+" is valid\n");
        else System.out.print(look+" is invalid\n");
        look=line.nextLine();
        }while (!look.equals("ext"));
    }//end of manualWordLookup
}
/**
 * This is the main class responsible for generating misspellings.
 * @author Catherine Austria
 */
public class MisSpell extends SpellChecker{
    public List<String> possibilities = new ArrayList<String>();//stores possible suggestions
    private List<String> tempHolder = new ArrayList<String>(); //telps for the transposition method
    private int Ldistance=0; // the distance related to the two words
    private String wrongWord;// the original wrong word.

    /**
     * Execute methods that make misspellings.
     * @param wordCheck the word being checked.
     */
    public void generateMispellings(String wordCheck){
        wrongWord=wordCheck;
        try{
            concatFL(wordCheck);
            concatLL(wordCheck);
            replaceFL(wordCheck);
            replaceLL(wordCheck);
            deleteFL(wordCheck);
            deleteLL(wordCheck);
            pluralize(wordCheck);
            transposition(wordCheck);
        }catch(StringIndexOutOfBoundsException e){ 
            System.out.println();
        }catch(ArrayIndexOutOfBoundsException e){
            System.out.println();
        }


    }

    /**
     * This method concats the word behind each of the alphabet letters and checks if it is in the dictionary. 
     * FL for first letter
     * @param word the word being manipulated.
     */
    public void concatFL(String word){
        char cur; // current character
        String tempWord=""; // stores temp made up word
        for(int i=97;i<123;i++){
            cur=(char)i;//assign ASCII from index i value
            tempWord+=cur;
            //if the word is in the dictionary then add it to the possibilities list
            tempWord=tempWord.concat(word); //add passed String to end of tempWord
            checkDict(tempWord); //check to see if in dictionary
            tempWord="";//reset temp word to contain nothing
        }//end of for
    }//end of concatFL

    /**
     * This concatenates the alphabet letters behind each of the word and checks if it is in the dictionary. LL for last letter.
     * @param word the word being manipulated.
     */
    public void concatLL(String word){
        char cur; // current character
        String tempWord=""; // stores temp made up word
        for(int i=123;i>97;i--){
            cur=(char)i;//assign ASCII from index i value
            tempWord=tempWord.concat(word); //add passed String to end of tempWord
            tempWord+=cur;
            //if the word is in the dictionary then add it to the possibilities list
            checkDict(tempWord);
            tempWord="";//reset temp word to contain nothing
        }//end of for
    }//end of concatLL

    /**
     * This method replaces the first letter (FL) of a word with alphabet letters.
     * @param word the word being manipulated.
     */
    public void replaceFL(String word){
        char cur; // current character
        String tempWord=""; // stores temp made up word
        for(int i=97;i<123;i++){
            cur=(char)i;//assign ASCII from index i value
            tempWord=cur+word.substring(1,word.length()); //add the ascii of i ad the substring of the word from index 1 till the word's last index
            checkDict(tempWord);
            tempWord="";//reset temp word to contain nothing
        }//end of for
    }//end of replaceFL

    /**
     * This method replaces the last letter (LL) of a word with alphabet letters
     * @param word the word being manipulated.
     */
    public void replaceLL(String word){
        char cur; // current character
        String tempWord=""; // stores temp made up word
        for(int i=97;i<123;i++){
            cur=(char)i;//assign ASCII from index i value
            tempWord=word.substring(0,word.length()-1)+cur; //add the ascii of i ad the substring of the word from index 1 till the word's last index
            checkDict(tempWord);
            tempWord="";//reset temp word to contain nothing
        }//end of for
    }//end of replaceLL

    /**
     * This deletes first letter and sees if it is in dictionary
     * @param word the word being manipulated.
     */
    public void deleteFL(String word){
        String tempWord=word.substring(1,word.length()-1); // stores temp made up word
        checkDict(tempWord);
        //print(possibilities);
    }//end of deleteFL

    /**
     * This deletes last letter and sees if it is in dictionary
     * @param word the word being manipulated.
     */
    public void deleteLL(String word){
        String tempWord=word.substring(0,word.length()-1); // stores temp made up word
        checkDict(tempWord);
        //print(possibilities);
    }//end of deleteLL

    /**
     * This method pluralizes a word input
     * @param word the word being manipulated.
     */
    public void pluralize(String word){
        String tempWord=word+"s";
        checkDict(tempWord);
    }//end of pluralize

    /**
     * It's purpose is to check a word if it is in the dictionary. 
     * If it is, then add it to the possibilities list.
     * @param word the word being checked.
     */
    public void checkDict(String word){
        if(dictionary.contains(word)){//check to see if tempWord is in dictionary
            //if the tempWord IS in the dictionary, then check if it is in the possibilities list 
            //then if tempWord IS NOT in the list, then add tempWord to list
            if(!possibilities.contains(word)) possibilities.add(word);
        }
    }//end of checkDict

    /**
     * This method transposes letters of a word into different places.
     * Not the best implementation. This guy was my last minute addition.
     * @param word the word being manipulated.
     */
    public void transposition(String word){
        wrongWord=word;
        int wordLen=word.length();
        String[] mixer = new String[wordLen]; //String[] length of the passed word
        //make word into String[]
        for(int i=0;i<wordLen;i++){
            mixer [i]=word.substring(i,i+1);
        }
        shift(mixer);
    }//end of transposition

    /**
     * This method takes a string[] list then shifts the value in between 
     * the elements in the list and checks if in dictionary, adds if so. 
     * I agree that this is probably the brute force implementation.
     * @param mixer the String array being shifted around.
     */
    public void shift(String[] mixer){
        System.out.println();
        String wordValue="";
        for(int i=0;i<=tempHolder.size();i++){
            resetHelper(tempHolder);//reset the helper
            transposeHelper(mixer);//fill tempHolder
            String wordFirstValue=tempHolder.remove(i);//remove value at index in tempHolder
            for(int j=0;j<tempHolder.size();j++){
                int inttemp=0;
                String temp;
                while(inttemp<j){
                    temp=tempHolder.remove(inttemp);
                    tempHolder.add(temp);
                    wordValue+=wordFirstValue+printWord(tempHolder);
                    inttemp++;
                    if(dictionary.contains(wordValue)) if(!possibilities.contains(wordValue)) possibilities.add(wordValue);
                    wordValue="";
                }//end of while
            }//end of for
        }//end for
    }//end of shift

    /**
     * This method fills a list tempHolder with contents from String[]
     * @param wordMix the String array being shifted around.
     */
    public void transposeHelper(String[] wordMix){
        for(int i=0;i<wordMix.length;i++){
            tempHolder.add(wordMix[i]);
        }
    }//end of transposeHelper

    /**
     * This resets a list
     * @param thisList removes the content of a list
     */
    public void resetHelper(List<String> thisList){
        while(!thisList.isEmpty()) thisList.remove(0); //while list is not empty, remove first value
    }//end of resetHelper

    /**
     * This method prints out a list
     * @param listPrint the list to print out.
     */
    public void print(List<String> listPrint){
        if (possibilities.isEmpty()) {
            System.out.print("Can't seem to find any related words for "+wrongWord);
            return;
        }
        System.out.println("Maybe you meant these for "+wrongWord+": ");
        System.out.printf("%s", listPrint);
        resetHelper(possibilities);
    }//end of print

    /**
     * This returns a String word version of a list
     * @param listPrint the list to make into a word.
     * @return the generated word version of a list.
     */
    public String printWord(List<String> listPrint){
        Object[] suggests = listPrint.toArray();
        String theWord="";
        for(Object word: suggests){//form listPrint elements into a word
            theWord+=word;
        }
        return theWord;
    }//end of printWord
}
7
  • What would be the keys and values of the hashtable? Commented Oct 27, 2015 at 17:19
  • 1
    What do you intend to do with your generated misspellings? Commented Oct 27, 2015 at 17:28
  • 1
    This seems helpful: norvig.com/spell-correct.html Commented Oct 27, 2015 at 17:38
  • I wanted to the keys to be the correct spelling of a word and the values the various generated misspellings. The problem with this is that searching through the misspellings is Linear so it'll end up as O(n). Thats not too efficient is it? :( Commented Oct 27, 2015 at 18:39
  • @michaelsnowden that seems like a very helpful site to read from. I'll start reading it an see if I get back to you guys with an answer of my own. :D Commented Oct 27, 2015 at 19:10

2 Answers 2

2

Instead of generating all possible misspellings of the words in your dictionary and adding them to the hash table, consider performing all possible changes (that you already suggested) to the user-entered words, and checking to see if those words are in the dictionary file.

Sign up to request clarification or add additional context in comments.

2 Comments

How would the hash tables fit in that approach though?
Add all (correctly spelled) dictionary words to the hash table . Then you can quickly (in near constant time) check to see if the user-entered word is in the table. If it is, then it is spelled correctly. If it isn't, then you can check to see if any of its variatipns are in the hash table. If any of these variations are in the hash table then you have found a suggestion (or several) on how to fix your misspelled word.
2

It sounds like what you want is a quick way to verify that a word is spelled correctly, or to find the correct spelling. If this is what your trying to do you can use a HashMap<String,String> (i.e. a hash table with String keys and String values). Whenever you find a word in your dictionary you enter a key for it with a null value indicating that the word is not to be changed (i.e. a correct spelling). You can then compute and add keys for possible misspellings and give the correct word for the value.

You'd have to devise a way to do this very carefully, because if your dictionary has two similar words "clot" and "colt" computed misspellings of one may replace the correct spelling (or misspellings) of the other. Once your done you can look up a word to see if it is in the dictionary, if it is a misspelling of a dictionary word (and which word), or if it is not found at all.

I believe this is a bad design though, because your table has to be exponentially larger than your (I assume, already quite large) dictionary. And because you spent a lot of time calculating many misspelling for every word in the dictionary (very big overhead if you only check a few lines which may contain a few of these words). Given a only a little liberty I would opt for a HashSet<String> (which is a hash table but without values) filled only with dictionary words. This is allows you to check quickly if a word is in the dictionary or not.

You can dynamically compute other ways to spell words when you encounter ones not in your dictionary. If your doing this for only a line or two it should not be slow at all (certainly faster than computing alternatives for everything in your dictionary). But if you wanted to this for every for a whole file you may want to keep a much smaller HashMap<String,String> separate from your dictionary to store any corrections you find since the author may misspell the word the same way in future. Checking this before computing alternatives keeps you from duplicating your efforts several times over.

4 Comments

Well the idea is for the task is to implement my own Hash Table (with my own hash function of course) to basically find if a word is valid or not and give out suggestions to the user.
I read up on BK Trees and I wonder is there is a way to implement that logic to hash tables.
@CatherineAustria I merely refer to HashMap as a succinct way to describe hash tables, keys and values. You can use your own hash tables in it's place (and easily reuse your hash function for a hash set as well).
@CatherineAustria I'm not sure about BK Trees. You might want to look into locality sensitive hash functions, and use the Damerau–Levenshtein distance to keep suggestions while you probe (all of your examples were a distance of 1 from "hello"). Not sure this is a good idea, but perhaps worth a look.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.