1

Is it a good idea to store words of a dictionary with 100.000 words in a static array of string. I'm working on spellchecker and I thought that way would be faster.

0

6 Answers 6

5

You should generally prefer a Java Collections Framework class to a native Java array for anything non-trivial. In this particular case, what you have is a Set<String> (since no words should appear more than once in the dictionary).

A HashSet<String> offers constant time performance for the basic operations add, remove, and contains, and should work very well with String hashcode formula.

For larger dictionaries, you'd want to use more sophisticated data structures specialized for storing a set of strings (e.g. a trie), but for 100K words, a HashSet should suffice.

See also

Sign up to request clarification or add additional context in comments.

2 Comments

+1 for mentioning Trie, since it's an efficient data structure to use within the mentioned problem field, i.e. spell checking.
For trie implementation in Java: stackoverflow.com/questions/623892/…
5

Definitely its not a good idea to store so many strings as an array especially if you are using it for spell check which means you will have to search for and compare strings. It would make it inefficient to search or compare a string through the array as it would always be a linear search

3 Comments

Almost anyone would alphabetize and binary search on an array.
+1: Big O is no always relevant, but when n = 100k, O(n) equals "slow as hell".
100k isn't all that large an N; the problem isn't a linear search through 100k entries, but that each entry is going to require a string comparison. +1 to using a HashSet, at this scale. Consider using either a Trie, a B+Tree, a ISAM-Tree, or a String-BTree once you start looking at an order or two of magnitude larger than this.
1

How about an approach with in memory database technology like for example sqlite inmemory This allows you to use efficient querying without disk overhead

Comments

0

I think 100 000 is not so large amount that search wolud be inefficent. Of course it depends ... It would work nice if you are checking if a word exists in array - it's a linear complexity algorithm. You can keep table ordered so you can use quicksort search algoritm and make it more efficent.

On the other hand - if you wold like to find, 5 most likely words (using N-gram method or something) you should consider using Lucene or other text database.

Comments

0

Perhaps using an SQLite database would be more efficient ? I think that's what firefox/thunderbird does for spell checking but I'm not entirely sure.

Comments

0

You won't be able to store that amount of strings in a static variable. Java has a size limit for static code and even method bodies. Simply use a flatfile and read it upon class instanciation - Java is faster than most people think with those things.

See Enum exeeding the 65535 bytes limit of static initializer... what's best to do?.

1 Comment

dear downvoter: although I agree that this is not a very elegant answer, I do think that a downvote to a serious (non-troll) answer deserves a comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.