7

Hello SQL Server engine experts; please share with us a bit of your insight...

As I understand it, INCLUDE columns on a non-clustered index allow additional, non-key data to be stored with the index pages.

I am well aware of the performance benefits a clustered index has over a non-clustered index simply due to the 1 fewer step the engine must take in a retrieval in order to arrive at the data on disk.

However, since INCLUDE columns live in a non-clustered index, can the following query be expected to have essentially the same performance across scenarios 1 and 2 since all columns could be retrieved from the index pages in scenario 2 rather than ever resorting to the table data pages?

QUERY

SELECT A, B, C FROM TBL ORDER BY A

SCENARIO 1

CREATE CLUSTERED INDEX IX1 ON TBL (A, B, C);

SCENARIO 2

CREATED NONCLUSTERED INDEX IX1 ON TBL (A) INCLUDE (B, C);
5
  • What does the actual query plan say when you switch between index setups? Commented Aug 2, 2010 at 1:40
  • it will depend on the overall query workload hitting that table. Commented Aug 2, 2010 at 1:52
  • Interestingly the non-clustered index is preferred in the query plan, but the possibility of using query hints, the engine's complexity in deciding an optimal query plan based on volume of data and who knows what other factors leave this a lingering mystery for me. I'm really looking for someone to speak up and say INCLUDE columns are not all they're cracked up to be for some reason or other... Commented Aug 2, 2010 at 1:54
  • The INCLUDE columns have their place, and I think your example is a good one for them Commented Aug 2, 2010 at 2:01
  • Actually, the additional INCLUDEd columns are not stored on the index pages (which assumes: all of them), but only on the LEAF-level index pages. So if you have an index with four levels of entries, only on the level no. 3 - the leaf level - will you have the additional columns - they don't clutter up the entire index Commented Aug 2, 2010 at 4:53

2 Answers 2

5

Indeed a non-clustered index with covering include columns can play exactly the same role as a clustered index. The cost is at update time: more include columns means more indexes have to be updated when an included column value is changed in the base table (in the clustered index). Also, with more included columns, the size-of-data increases: the database becomes larger and this can complicate maintenance operations.

In the end, is a balance you have to find between the covering value of the additional indexes and more included columns vs. the cost of update and data size increase.

Sign up to request clarification or add additional context in comments.

Comments

4

For this example you may actually get better performance with the non-clustered index. But, it really depends on additional information you haven't provided. Here are some thoughts.

SQL Server stores information in 8KB pages; this includes data and indexes. If your table only includes columns A, B and C, then the data will be stored in approximately the same number of data pages and the non-clustered index pages. But, if you have more columns in the table, then the data will need more pages. The number of index pages wouldn't be any different.

So, in a table with more columns than your query needs, the query will work better with the non-clustered covering index (index with all columns). It will be able to process fewer pages to return the results you want.

Of course, performance differences may not be seen until you get a very large number of rows.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.