2

I have 2 tables facts and customer, with 422,000 and 350,000 rows.

When running this query, the joining used is Hash Match:

SELECT SUM(sales), [state], County
FROM pp.Facts f
INNER JOIN pp.Customer c ON (c.customerKey = f.customerKey)
WHERE c.County IN (N'Nassau', N'Westchester', N'Erie', N'Orange', N'Union', 
                   N'Santa Clara', N'San Diego', N'Essex', N'Morris', 
                   N'Dallas', N'Allegheny', N'Bucks')
GROUP BY [state], County

And it works great.

When running is query (same, but with more items in the filter), the joining changes to Nested Loops and never comes back (obviously)

SELECT SUM(sales), [state], County
FROM pp.Facts f
INNER JOIN pp.Customer c ON (c.customerKey = f.customerKey)
WHERE c.County IN (N'Nassau', N'Westchester', N'Erie', N'Orange', N'Union', 
                   N'Santa Clara', N'San Diego', N'Essex', N'Morris', 
                   N'Dallas', N'Allegheny', N'Bucks', N'New York', 
                   N'Bergen', N'Montgomery', N'Harris', N'Delaware', N'San 
                   Francisco', N'Suffolk', N'Travis', N'Middlesex', 
                   N'Bexar', N'Tarrant', N'Los Angeles', N'Philadelphia')
GROUP BY [state], County

Why is the joining method changed?

There is a FK defined on the customerKey columns and a PK on customer.customerKey (no additional indexes)

3
  • 1
    Q: Why has the plan changed? A: The queries are different. Seriously though, the optimizer will use statistics to estimate the number of rows returned from each operator in the execution plan. Have a look at the Estimated Number of Rows returned from the Customer table. Commented Oct 4, 2018 at 11:19
  • May be because in the first query the county filters 10% customers and in the second one it filters a larger percentage of records e.g. 30%. Commented Oct 4, 2018 at 11:55
  • @MJH the "Estimated Number of Rows" was 1 for the customer tables, so the optimized thought there will be 1 go for the nested loops... thanks! Commented Oct 4, 2018 at 12:33

2 Answers 2

3

Why? That is simple. When SQL Server is analyzing the query during the compilation phase, the optimizer considers different execution plans. Apparently, the optimizer thinks that the nested-loop is more efficient than other methods.

The optimizer is wrong, probably because statistics are not correct or because assumptions about value distributions are not true for your query.

I have faced this problem in the past with views -- where small changes to underlying data caused big changes to query plans. I recommend that you use query hints to ensure that you get the plan you want, probably:

OPTION (HASH JOIN)

For this query:

SELECT SUM(f.sales), c.[state], c.County
FROM pp.Facts f INNER JOIN
     pp.Customer c
     ON c.customerKey = f.customerKey
WHERE c.County IN ( . . . )
GROUP BY c.[state], c.County;

You want indexes on Customer(County, State, CustomerKey) and Facts(customerKey, sales). The index is a covering index. The first column should satisfy the WHERE clause.

I notice that you are using nvarchar constants. This assumes that c.County is nvarchar. If it is varchar, drop the N.

Sign up to request clarification or add additional context in comments.

2 Comments

The proposed index for Customer looks very strange.
In the suggested index, State could be moved to end, or even INCLUDEd. I might as well create a reverse index (CustomerKey, County) INCLUDE (State) let SQL server choose (and delete the one not chosen).
1

SQL Server chooses the plan in terms of resources. The type of connection (LOOP | MERGE | HASH) is selected based on the estimated number of rows and order. If you do not have indexes, dbms can be bad in calculating the result of rows and, under "more" number of conditions, assume a much smaller number of lines and, as a result, build a plan with NL. Try creating statistics on the (customerKey, Country), NC index. Or use hints INNER HASH JOIN or the entire query OPTION (HASH JOIN). But if conditions change, you can get a bad plan using hints.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.