3

I have a large join statement in linq that is taking far too long to execute (over 2 minutes). I made it the way it was to try and see if the database was set up correctly and I was getting the data properly, but now the project is almost complete and that kind of delay is unacceptable.

The problem is rather obviously the String.Join statements but I don't know how to do it another way, so my question is how can I modify the link statement to get the same information, but in a quicker way. I had thought that maybe it would work better to just join the data and store it somewhere else on the creation of one of the entities that needed joining but if possible it would be better to just have this statement be better.

It is possible to have no components for a ProjectQueryElement with a function of "b" or "m" etc.

var dc = datacontext;
var resultSet = (
    from r in dc.requests
    select new ProjectQueryElement {
        bKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "b"
                    ).Select( x => x.idNumber )
               ),
        mKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "m"
                    ).Select( x => x.idNumber )
               ),
        oKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "o"
                    ).Select( x => x.idNumber )
               ),
        pKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "p"
                    ).Select( x => x.idNumber )
               ),
        rKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "r"
                    ).Select( x => x.idNumber )
               )
       }
);

The resultSet will then be passed to a jqgrid as the rows for the grid.

9
  • 7
    I'd start off by introducing a join between r and dc.comps, given that you only care about elements of dc.comps where the reRequestId is r.requestId... but have you looked at what the SQL looks like? Commented May 9, 2016 at 13:40
  • Additionally to Jon Skeet's answer I would suggest to introduce a StringBuilder and build the string with help of them. Commented May 9, 2016 at 13:47
  • 1
    See the comments under the answer of Manfred regarding that @ckruczek Commented May 9, 2016 at 13:48
  • What is the performance if you test without string.Join? I'd challenge any and all assumptions. If the problem is the join, what if you just didn't do it, and instead left the strings in arrays? Commented May 9, 2016 at 13:49
  • @ScottHannen I have tried it without the String.Joins and that is for sure the problem. Without them it still takes longer than I'd like, but much faster than with the joins. I hadn't considered leaving them as arrays because that hasn't been done anywhere else in the project, this resultset gets passed to a jqgrid immediately after and I'm not possitive the effects of an array going into it. Commented May 9, 2016 at 14:02

2 Answers 2

2

The problem is rather obviously the String.Join statements

I doubt the String.Join is the bottleneck, but rather the subqueries passed to that method calls.

Suppose you have total N requests and total M comps. In LINQ to Objects, your query will perform 5 * N * M linear searches. Database queryable provider translated SQL query will perform 5 * N subqueries, and if there are no appropriate indexes, the resulting time complexity will be similar to LINQ to Objects (and the time even worse due to full table scans).

IMO the best would be to separate the data retrieval logic from the data transformation logic. Do the raw data correlation (join - something that @Jon Skeet immediately pointed out in the comments) and projection (select) inside the database and then do the data transformation (string conversion/joining) in memory, preferable processing the db query result sequentially (AsEnumerable() vs ToList and similar).

Applying all this would lead to something like this:

var resultSet =
    (from r in dc.requests
     join c in dc.comps on r.requestId equals c.reRequestId into requestComps
     select requestComps.Select(c => new { c.function, c.idNumber }))
    .AsEnumerable()
    .Select(requestComps => new ProjectQueryElement
    {
        bKey = string.Join("|", requestComps.Where(c => c.function == "b").Select(c => c.idNumber)),
        mKey = string.Join("|", requestComps.Where(c => c.function == "m").Select(c => c.idNumber)),
        oKey = string.Join("|", requestComps.Where(c => c.function == "o").Select(c => c.idNumber)),
        pKey = string.Join("|", requestComps.Where(c => c.function == "p").Select(c => c.idNumber)),
        rKey = string.Join("|", requestComps.Where(c => c.function == "r").Select(c => c.idNumber)),
    });

At this point IMO you should have a nice fast running query.

Now you can optimize the transformation part (the last Select) if really needed. string.Join is internally optimized, so the only optimization could be to do a single pass over the requestComps set, but since it's running in memory I think the performance impact will be negligible. Providing that part just for completeness so you can measure and see the actual impact:

var query =
    // ...
    .Select(requestComps =>
    {
        StringBuilder bKey = null, mKey = null, oKey = null, pKey = null, rKey = null;
        foreach (var c in requestComps)
        {
            switch (c.function)
            {
                case "b": bKey = (bKey != null ? bKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
                case "m": mKey = (mKey != null ? mKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
                case "o": oKey = (oKey != null ? oKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
                case "p": pKey = (pKey != null ? pKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
                case "r": rKey = (rKey != null ? rKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
            }
        }
        return new ProjectQueryElement
        {
            bKey = bKey != null ? bKey.ToString() : string.Empty,
            mKey = mKey != null ? mKey.ToString() : string.Empty,
            oKey = oKey != null ? oKey.ToString() : string.Empty,
            pKey = pKey != null ? pKey.ToString() : string.Empty,
            rKey = rKey != null ? rKey.ToString() : string.Empty,
        };
    });
Sign up to request clarification or add additional context in comments.

Comments

0

Since you are using Linq, you could just use the Aggregate function instead:

xKey = dc.comps
    .Where(x => x.reRequestId == r.requestId && x.function == "b")
    .Aggregate((result, next) => result + "|" + next.idNumber);

7 Comments

That would create an excessive amount of strings. I think this will make the problem worse.
@PatrickHofman Actually It should be turning that into SQL, though I doubt it makes a difference.
@PatrickHofman Sure, but the query can now be executed on the database since it no longer contains String.Join
@ManfredRadlwimmer String.Join is being translated to SQL in the OP's current solution. So I doubt this would make any improvement as it seems like the real issue is with the sub queries instead of using a join.
Okay, didn't know that. (And I didn't find any reference to EF, so I assumed it was just LINQ). And I didn't downvote you by the way.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.