1

I have some bad performance issues in my application. One of the big operations is comparing strings. I download a list of strings, approximately 1000 - 10000. These are all unique strings. Then I need to check if these strings already exists in the database. The linq query that I'm using looks like this:

IEnumerable<string> allNewStrings = DownloadAllStrings();

var selection = from a in allNewStrings
                where !(from o in context.Items
                        select o.TheUniqueString).Contains(a)
                select a;

Am I doing something wrong or how could I make this process faster preferably with Linq?

Thanks.

2 Answers 2

1

You did query the same unique strings 1000 - 10000 times for every element in allNewStrings, so it's extremely inefficient.

Try to query unique strings separately in order that it is executed once:

IEnumerable<string> allNewStrings = DownloadAllStrings();

var uniqueStrings = from o in context.Items
                    select o.TheUniqueString;

var selection = from a in allNewStrings
                where !uniqueStrings.Contains(a)
                select a;

Now you can see that the last query could be written using Except which is more efficient for the case of set operators like your example:

var selection = allNewStrings.Except(uniqueStrings);
Sign up to request clarification or add additional context in comments.

4 Comments

Except() will have much better performance than !Contains()
@Magnus: yes, Set operators are more efficient than member test on sequences. I updated the answer to state it more clearly.
Thanks for this greate solution. My only initial concern with this is it's working nicely now, but what will happen when I have 1 million posts in Items, and need to "download" them all to memory, and then do the compare?
This approach is still reasonable for a big number of uniqueStrings assuming that you have large memory to avoid cache misses.
1

An alternative solution would be to use a HashSet:

var set = new HashSet<string>(DownloadAllStrings());
set.ExceptWith(context.Items.Select(s => s.TheUniqueString));

The set will now contain the the strings that are not in the DB.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.