How to JOIN strings efficiently in linq?

Question

I have a large join statement in linq that is taking far too long to execute (over 2 minutes). I made it the way it was to try and see if the database was set up correctly and I was getting the data properly, but now the project is almost complete and that kind of delay is unacceptable.

The problem is rather obviously the String.Join statements but I don't know how to do it another way, so my question is how can I modify the link statement to get the same information, but in a quicker way. I had thought that maybe it would work better to just join the data and store it somewhere else on the creation of one of the entities that needed joining but if possible it would be better to just have this statement be better.

It is possible to have no components for a ProjectQueryElement with a function of "b" or "m" etc.

var dc = datacontext;
var resultSet = (
    from r in dc.requests
    select new ProjectQueryElement {
        bKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "b"
                    ).Select( x => x.idNumber )
               ),
        mKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "m"
                    ).Select( x => x.idNumber )
               ),
        oKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "o"
                    ).Select( x => x.idNumber )
               ),
        pKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "p"
                    ).Select( x => x.idNumber )
               ),
        rKey = String.Join( "|", dc.comps
                    .Where(
                        x => x.reRequestId == r.requestId && x.function == "r"
                    ).Select( x => x.idNumber )
               )
       }
);

The resultSet will then be passed to a jqgrid as the rows for the grid.

I'd start off by introducing a join between r and dc.comps, given that you only care about elements of dc.comps where the reRequestId is r.requestId... but have you looked at what the SQL looks like? — Jon Skeet
– Jon Skeet, Commented May 9, 2016 at 13:40
Additionally to Jon Skeet's answer I would suggest to introduce a StringBuilder and build the string with help of them. — ckruczek
– ckruczek, Commented May 9, 2016 at 13:47
See the comments under the answer of Manfred regarding that @ckruczek — Patrick Hofman
– Patrick Hofman, Commented May 9, 2016 at 13:48
What is the performance if you test without string.Join? I'd challenge any and all assumptions. If the problem is the join, what if you just didn't do it, and instead left the strings in arrays? — Scott Hannen
– Scott Hannen, Commented May 9, 2016 at 13:49
@ScottHannen I have tried it without the String.Joins and that is for sure the problem. Without them it still takes longer than I'd like, but much faster than with the joins. I hadn't considered leaving them as arrays because that hasn't been done anywhere else in the project, this resultset gets passed to a jqgrid immediately after and I'm not possitive the effects of an array going into it. — WWZee
– WWZee, Commented May 9, 2016 at 14:02

Ivan Stoev · Accepted Answer · 2016-05-10 14:41:04Z

The problem is rather obviously the String.Join statements

I doubt the String.Join is the bottleneck, but rather the subqueries passed to that method calls.

Suppose you have total N requests and total M comps. In LINQ to Objects, your query will perform 5 * N * M linear searches. Database queryable provider translated SQL query will perform 5 * N subqueries, and if there are no appropriate indexes, the resulting time complexity will be similar to LINQ to Objects (and the time even worse due to full table scans).

IMO the best would be to separate the data retrieval logic from the data transformation logic. Do the raw data correlation (join - something that @Jon Skeet immediately pointed out in the comments) and projection (select) inside the database and then do the data transformation (string conversion/joining) in memory, preferable processing the db query result sequentially (AsEnumerable() vs ToList and similar).

Applying all this would lead to something like this:

var resultSet =
    (from r in dc.requests
     join c in dc.comps on r.requestId equals c.reRequestId into requestComps
     select requestComps.Select(c => new { c.function, c.idNumber }))
    .AsEnumerable()
    .Select(requestComps => new ProjectQueryElement
    {
        bKey = string.Join("|", requestComps.Where(c => c.function == "b").Select(c => c.idNumber)),
        mKey = string.Join("|", requestComps.Where(c => c.function == "m").Select(c => c.idNumber)),
        oKey = string.Join("|", requestComps.Where(c => c.function == "o").Select(c => c.idNumber)),
        pKey = string.Join("|", requestComps.Where(c => c.function == "p").Select(c => c.idNumber)),
        rKey = string.Join("|", requestComps.Where(c => c.function == "r").Select(c => c.idNumber)),
    });

At this point IMO you should have a nice fast running query.

Now you can optimize the transformation part (the last Select) if really needed. string.Join is internally optimized, so the only optimization could be to do a single pass over the requestComps set, but since it's running in memory I think the performance impact will be negligible. Providing that part just for completeness so you can measure and see the actual impact:

var query =
    // ...
    .Select(requestComps =>
    {
        StringBuilder bKey = null, mKey = null, oKey = null, pKey = null, rKey = null;
        foreach (var c in requestComps)
        {
            switch (c.function)
            {
                case "b": bKey = (bKey != null ? bKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
                case "m": mKey = (mKey != null ? mKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
                case "o": oKey = (oKey != null ? oKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
                case "p": pKey = (pKey != null ? pKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
                case "r": rKey = (rKey != null ? rKey.Append("|") : new StringBuilder()).Append(c.idNumber); break;
            }
        }
        return new ProjectQueryElement
        {
            bKey = bKey != null ? bKey.ToString() : string.Empty,
            mKey = mKey != null ? mKey.ToString() : string.Empty,
            oKey = oKey != null ? oKey.ToString() : string.Empty,
            pKey = pKey != null ? pKey.ToString() : string.Empty,
            rKey = rKey != null ? rKey.ToString() : string.Empty,
        };
    });

Manfred Radlwimmer · Accepted Answer · 2016-05-09 13:42:56Z

0

Since you are using Linq, you could just use the Aggregate function instead:

xKey = dc.comps
    .Where(x => x.reRequestId == r.requestId && x.function == "b")
    .Aggregate((result, next) => result + "|" + next.idNumber);

answered May 9, 2016 at 13:42

Manfred Radlwimmer

13.4k13 gold badges55 silver badges65 bronze badges

7 Comments

Patrick Hofman Over a year ago

That would create an excessive amount of strings. I think this will make the problem worse.

juharr Over a year ago

@PatrickHofman Actually It should be turning that into SQL, though I doubt it makes a difference.

Manfred Radlwimmer Over a year ago

@PatrickHofman Sure, but the query can now be executed on the database since it no longer contains String.Join

juharr Over a year ago

@ManfredRadlwimmer String.Join is being translated to SQL in the OP's current solution. So I doubt this would make any improvement as it seems like the real issue is with the sub queries instead of using a join.

Patrick Hofman Over a year ago

Okay, didn't know that. (And I didn't find any reference to EF, so I assumed it was just LINQ). And I didn't downvote you by the way.

|

Collectives™ on Stack Overflow

How to JOIN strings efficiently in linq?

2 Answers 2

Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related