Performance of clustered index on SQL Query

Question

Suppose you have two tables:

Student(id, class) // 100 rows
Course(id, course) // 100 rows

Initially assume that there is no index on both the tables. Now suppose we have a query:-

select id, course 
from Student join course 
on student.id = Course.id and student.id = 20

Since you don't have any index, so you need to traverse all the rows in both tables.

Time complexity - O(100 x 100)

Now we updated the table and Student.id is a primary key. Clustered index will be created on it and now the overall complexity is

Time complexity - O(log 100) // Nested loop join

Do you think my assumption is correct? Could anyone help me?

Nested loop join algo is here:

I think you should not mix join + where. select id, course from Student join Course ON student.id = Course.id WHERE student.id = 20 — Lukasz Szozda
– Lukasz Szozda, Commented Dec 12, 2015 at 6:14
Please do not use this outdated join syntax. It works but SQL Standard has JOIN .. ON... for that. — Lukasz Szozda
– Lukasz Szozda, Commented Dec 12, 2015 at 6:16

Dimitar · Accepted Answer · 2015-12-23 11:46:53Z

join course 
on student.id = Course.id

is correctly in O(MN) (Worst-Case), where M and N are the number of rows in the first and second table respectively, since that is an equi-join (Join on a = condition) it compares every row from the first with the second.

However, you also have a second condition. Since SQL has many performance-enhancing algorithms, it is very likely that the student.id = 20 would be evaluated first. Then you would have firstly M (assume linear in the number of rows of the students table) to search for student.id = 20. Then, if student.id = 20 is only constant, let's say m, you would have m * N.

All in all, O(M + (m * N)).

That here depends now on m. If m is constant then in the asymptotic analysis O(M + N) = O(2M), since M=N and you end up with O(M) = O(N) or linear. Otherwise, if m is in Omega(1), then it would be O(M + M * N) or as you assumed O(MN).

Then with regards to the PRIMARY KEY a clustered index will/can be created. The time complexity now for future queries would be as you said O(log K), where K are the rows in the new table (can be != 100).

Now why log K? Because relational databases structure indices as B-trees. Then in WC you get O(log K) in the height of the tree.

More precisely

since on B-trees you have max. 2d children and number of keys s between d - 1 < s < 2d. d is called order, degree or branching factor of the tree.

Hope it helps!

Collectives™ on Stack Overflow

Performance of clustered index on SQL Query

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related