6
string[] words = System.IO.File.ReadAllLines("word.txt");
var query = from word in words
            where word.Length > "abe".Length && word.StartsWith("abe")
            select word;
foreach (var w in query.AsParallel())
{
    Console.WriteLine(w);
}

Basically the word.txt contains 170000 English words. Is there a collection class in C# that is faster than array of string for the above query? There will be no insert or delete, just search if a string starts with "abe" or "abdi".

Each word in the file is unique.

EDIT 1 This search will be performed potentially millions of times in my application. Also I want to stick with LINQ for collection query because I might need to use aggregate function.

EDIT 2 The words from the file are sorted already, the file will not change

1
  • 2
    What's the use case scenario? Alexai brings up a good point, if this is a one-off search, then an array is fine. If this is going to be a scenario where you repeat the search any number of times, then the answer is different. Commented May 1, 2011 at 5:15

3 Answers 3

4

myself I'd create a Dictionary<char, List<string>>, where I'd group words by their first letter. This will reduce substantially the lookup of needed word.

Sign up to request clarification or add additional context in comments.

5 Comments

also you may want to check Wiki about prefix and suffix tree. These are meant for fast word search.
I believe the structure you're going for in your comment is a Trie.
Wouldn't it complicated the linq query?
The first option not at all, second yes, but you'll get fast results instead.
@Eugen do you know any known implementation of Trie in C#?
1

If you need to do search once there is nothing better than linear search - array is perfectly fine for it.

If you need to perform repeated searches you can consider soring the array (n Log n) and search by any prefix will be fast (long n). Depending on type of search using dictionary of string lists indexed by prefix may be another good option.

1 Comment

If you want to keep code as close as possible to orignal and perform large number of queries SortedList<string, string> looks like a good option. Prefix trees mentioned by Eugen would probably give you better performace but require more hand coding.
0

If you search much often than you change a file with words. You can sort words in file every time you change list. After this you can use bisectional search. So you will have to make up to 20 comparisons to find any word witch match with your key and some additional comparisons of neighborhood.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.