1

I have written the below linq statement. But it takes huge time to process since there are so many lines. My cpu has 8 cores but only using 1 core due to running single thread.

So i wonder by any chance can this final stament run in multi threading ?

        List<string> lstAllLines = File.ReadAllLines("AllLines.txt").ToList();
        List<string> lstBannedWords = File.ReadAllLines("allBaddWords.txt").
Select(s => s.ToLowerInvariant()).
Distinct().ToList();

I am asking the one below. Can that line work multi threading ?

        List<string> lstFoundBannedWords = lstBannedWords.Where(s => lstAllLines.
SelectMany(ls => ls.ToLowerInvariant().Split(' ')).
Contains(s)).
        Distinct().ToList();

C# 5 , netframework 4.5

7
  • 2
    Looked into PLINQ? Note, this is in no way guaranteed to make anything run faster. Commented May 31, 2013 at 13:33
  • @AdamHouldsworth haven't checked yet but let me take a look :) Commented May 31, 2013 at 13:35
  • stackoverflow.com/questions/7582591/… (Understanding Speedup in PLINQ) The more expensive a query is, the better candidate it is for PLINQ. Commented May 31, 2013 at 13:35
  • Wow it started using 5 cores with just adding 1 keyword hehe :D i love C# ^^ Commented May 31, 2013 at 13:37
  • 2
    @Chris .AsParallel() I think. Commented May 31, 2013 at 13:41

2 Answers 2

5

The following snippet can perform that operation using the Parallel Tasks Library's Parallel.ForEach method. The snippet below takes each line in the 'all-lines' file you have, splits it on spaces, and then searches each line for banned words. The Parallel-ForEach should use all available core's on your machine's processor. Hope this helps.

System.Threading.Tasks.Parallel.ForEach(
    lstAllLines,
    line =>
    {
        var wordsInLine = line.ToLowerInvariant().Split(' ');
        var bannedWords = lstBannedWords.All(bannedWord => wordsInLine.Contains(bannedWord));
        // TODO: Add the banned word(s) in the line to a master list of banned words found.
    });
Sign up to request clarification or add additional context in comments.

Comments

1

There are rooms for performance improvements before resorting to AsParallel

HashSet<string> lstAllLines = new HashSet<string>(
                                File.ReadAllLines("AllLines.txt")
                                    .SelectMany(ls => ls.ToLowerInvariant().Split(' ')));

List<string> lstBannedWords = File.ReadAllLines("allBaddWords.txt")
                                    .Select(s => s.ToLowerInvariant())
                                    .Distinct().ToList();

List<string> lstFoundBannedWords = lstBannedWords.Where(s => lstAllLines.Contains(s))
                                    .Distinct().ToList();

Since access to HasSet is O(1) and lstBannedWords is the shorter list, You may even not need any parallelism (TotalSearchTime=lstBannedWords.Count*O(1)). Lastly, you always have the option AsParallel

4 Comments

It wasn't me but you might want to rename the lstAllLines variable to something like hashAllWords to make the code easier to understand.
@Dirk I just wanted to preserve the variable names for the OP. Afterall, it is just Refactor/Rename when using VS.
actually this is not exactly doing what i am doing :) also i did not down vote. i am checking based on word by word level, you are checking literal level.
@MonsterMMORPG Have you tested it before commenting?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.