3

Given the following SQL Server table:

  • Employee (ssn, name, dept, manager, salary)

where ssn is the primary key.

Suppose there are 30 employee records per disk block. Each employee belongs to one of the departments. Explain why you should or shouldn't put a non-clustering index on dept to speed up this query in the following two cases:

SELECT ssn
FROM Employee
WHERE dept = 'IT'
  • when there are 50 departments
  • when there are 5000 departments

My basic understanding of clustered vs. non-clustered indexes in SQL Server is that clustered indexes should be used when there is a large amount of data to be returned, as they will initially sort the table by that index. Therefore, I believe that in the second scenario, with 5000 departments, you should not put a non-clustering index on dept to speed up the query.

I am confused about the first scenario because, as there are only 50 departments, does it really matter if a non-clustering or clustering index is used? The only reason I can think it might matter is if a clustering index takes extra time to first sort the data, while a non-clustering index does not.

Which clustering or non-clustering index should I use in these two cases?

1
  • I'd add a clustered on SSN and a non-clustered on dept in both cases and for future no matter how much data there. Non-clustered indexes depend on your queries, as a rule. For example, if you do not search by dept in that table, then you shouldn't create it, otherwise do it. Don't you want to have a separate table for departments? Commented Apr 22, 2017 at 18:18

2 Answers 2

1

Which clustering or non-clustering index should I use in these two cases?

With SSN as the primary key clustered index, a non-clustered index on dept will cover the query and be the most efficient regardless of the number of rows returned. Remember that the clustered index key (the primary key here) is implicitly included in non-clustered index leaf nodes as the row locator. This will avoid the need to access the separate data pages containing columns not needed by the query.

The execution plan should show only an index seek using the dept non-clustered index, touching only the data needed by the query.

Sign up to request clarification or add additional context in comments.

Comments

0

The question is missing an important parameter -- how many Employees?

If there are 100 employees in 50 departments, it is cheaper to scan the data rather than bounce between the index and the data.

If there are 10000 employees in 50 departments, it is cheaper to bounce between the index and the data.

The query optimizer should be smart enough to decide.

Also depends on whether 'IT' is a big department or not.

Bottom line: Have the index, and hope the that the Optimizer doesn't screw it up.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.