Using clustered vs non-clustered index on large data in SQL

Question

Given the following SQL Server table:

Employee (ssn, name, dept, manager, salary)

where ssn is the primary key.

Suppose there are 30 employee records per disk block. Each employee belongs to one of the departments. Explain why you should or shouldn't put a non-clustering index on dept to speed up this query in the following two cases:

SELECT ssn
FROM Employee
WHERE dept = 'IT'

when there are 50 departments
when there are 5000 departments

My basic understanding of clustered vs. non-clustered indexes in SQL Server is that clustered indexes should be used when there is a large amount of data to be returned, as they will initially sort the table by that index. Therefore, I believe that in the second scenario, with 5000 departments, you should not put a non-clustering index on dept to speed up the query.

I am confused about the first scenario because, as there are only 50 departments, does it really matter if a non-clustering or clustering index is used? The only reason I can think it might matter is if a clustering index takes extra time to first sort the data, while a non-clustering index does not.

Which clustering or non-clustering index should I use in these two cases?

I'd add a clustered on SSN and a non-clustered on dept in both cases and for future no matter how much data there. Non-clustered indexes depend on your queries, as a rule. For example, if you do not search by dept in that table, then you shouldn't create it, otherwise do it. Don't you want to have a separate table for departments? — crucifery
– crucifery, Commented Apr 22, 2017 at 18:18

Dan Guzman · Accepted Answer · 2017-04-22 18:18:05Z

1

Which clustering or non-clustering index should I use in these two cases?

With SSN as the primary key clustered index, a non-clustered index on dept will cover the query and be the most efficient regardless of the number of rows returned. Remember that the clustered index key (the primary key here) is implicitly included in non-clustered index leaf nodes as the row locator. This will avoid the need to access the separate data pages containing columns not needed by the query.

The execution plan should show only an index seek using the dept non-clustered index, touching only the data needed by the query.

answered Apr 22, 2017 at 18:18

Dan Guzman

46.8k3 gold badges52 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rick James · Accepted Answer · 2017-04-22 22:52:57Z

0

The question is missing an important parameter -- how many Employees?

If there are 100 employees in 50 departments, it is cheaper to scan the data rather than bounce between the index and the data.

If there are 10000 employees in 50 departments, it is cheaper to bounce between the index and the data.

The query optimizer should be smart enough to decide.

Also depends on whether 'IT' is a big department or not.

Bottom line: Have the index, and hope the that the Optimizer doesn't screw it up.

answered Apr 22, 2017 at 22:52

Rick James

144k15 gold badges144 silver badges255 bronze badges

Collectives™ on Stack Overflow

Using clustered vs non-clustered index on large data in SQL

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related