4

I am executing a number of queries with many values specified in an "IN" clause, like this:

SELECT 
    [time_taken], [distance], [from_location_geocode_id],
    [to_location_geocode_id] 
FROM 
    [Travel_Matrix] 
WHERE 
    [from_location_geocode_id] IN (@param1, @param2, @param3, @param4, @param5) 
    AND [to_location_geocode_id] IN (@param1, @param2, @param3, @param4, @param5)

The example shows 5 parameters, but in practice there can be hundreds of these.

For a small numbers of parameters (up to about 400), SQL Server uses an execution plan with a number of "compute scalar" operations, which are then concatenated, sorted and joined in order to return the results.

For a large number of parameters (over 400), it uses a "hash match (right semi join)" method, which is quicker.

However, I would like it to use the second execution plan much earlier e.g. on queries with 50 parameters, since my tests have shown queries with 50-400 parameters tend to get very slow.

I've tried using various "OPTION" values on my query, but cannot get it to execute using the second execution plan, which I know would be more efficient.

I'd be grateful to anyone who can advise how to give the query the correct hints, so that it executes in the manner of the second execution plan.

Thanks

3
  • 1
    Usually, Sql-Server optimizer is clever. Rarely is it a good idea to force it to do things your way. Commented Mar 23, 2016 at 10:41
  • 3
    If you can have a large number of parameters for your IN clause, you should store these parameters in temporary table and join on them instead Commented Mar 23, 2016 at 10:43
  • 1
    try to insert those variables in a temptable or table variable and then try to take from that. Commented Mar 23, 2016 at 10:44

3 Answers 3

5

I think 400 parameters using the IN clause is too much. You are better off storing these values in a temporary table and doing a JOIN on it, maybe with an index on the temp table's column to speed things up.

Sign up to request clarification or add additional context in comments.

5 Comments

That sadly is the answer. Sadly because no ORM i know of supports this.
Yes, I have tried that also. However, I found that the IN queries were faster than the temporary table approach, as long as it was using the second execution plan. Effectively the second execution plan is making a temporary table and joining it, but doing it all internally, which I suppose makes it quicker.
@TomTom most of the ORM's allow using stored procedures, or running raw customized SQL queries. It's useful in this scenarios...
Yes, but Entity Framework could also translate a IN clause into using a temporary table if there are more than X (elements in the condition. THAT would be useful. Coding a SP has a lot of disadvantages and then calling it with an arbitrary numbers of parameters is another problem. The solution would be quite simple - but no ORM does it.
This should be an automatic optimization of the database, that should not rely upon the client.
4

In Performance perspective IN clause is not good, try some thing below like this

DECLARE @Tmp TABLE(Id INT)
INSERT INTO @Tmp(Id) VALUES(@param1), (@param2), (@param3), (@param4), (@param5)

SELECT 
   [time_taken], [distance], [from_location_geocode_id],
   [to_location_geocode_id] 
FROM 
[Travel_Matrix] 
WHERE 
EXISTS (SELECT 1 FROM @Tmp Where @Tmp.Id=[from_location_geocode_id])
AND EXISTS (SELECT 1 FROM @Tmp Where @Tmp.Id=[to_location_geocode_id])

Comments

0

You also can create filtered index with those parameters,Even if you have index specifically covering all column values.With filtered index ,your queries will be much faster.But your inserts will be little slower and filtered indexes specifically fit your purpose..

Ex:
create table test
(
id int
)

insert into test
select top 100* from numbers
where n<=1100

now if our queries are always with large parameters say id in (2,100,45,98...)

if we create a filtered index like below

create index on dbo.test(id)
where id in (2,958,100)

our query will use that index and will be much faster,of course there are few limitations like between queries,case queries ,slower inserts.But i recommend testing this option and also make it covered

Update:
Further statistics are key to estimating row values,if you dont have an index with fromlcoationid and tolcoationid as key columns,sql will not create multicolumn stats.So one more option is to create multicolumn stats ,if you dont want to go with filtered index approach...

create statistics test1 on dbo.test(fromlocationid,tolcoationid)
where fromlocationid in (@param1,.....) and tolocationid in (@param1,@param2...)

Only issue i see with filtered stats is ,they will not be updated so frequently compared to regular stats . so you may want to try updating them manually through a job depending on your needs

2 Comments

Thanks for the suggestion. When I tried it, however, I found that creating the filtered index took a long time - my database has a lot more rows - unless I missed something in your suggestion.
yes ,it takes time to build ,its just like normal index but smaller in size and always stores only those values in where clause.You may also want to look at wait stats during index creation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.