0

Suppose you have two tables:

Student(id, class) // 100 rows
Course(id, course) // 100 rows

Initially assume that there is no index on both the tables. Now suppose we have a query:-

select id, course 
from Student join course 
on student.id = Course.id and student.id = 20

Since you don't have any index, so you need to traverse all the rows in both tables.

Time complexity - O(100 x 100)

Now we updated the table and Student.id is a primary key. Clustered index will be created on it and now the overall complexity is

Time complexity - O(log 100) // Nested loop join

Do you think my assumption is correct? Could anyone help me?

Nested loop join algo is here:

enter image description here

2
  • I think you should not mix join + where. select id, course from Student join Course ON student.id = Course.id WHERE student.id = 20 Commented Dec 12, 2015 at 6:14
  • 1
    Please do not use this outdated join syntax. It works but SQL Standard has JOIN .. ON... for that. Commented Dec 12, 2015 at 6:16

1 Answer 1

1
join course 
on student.id = Course.id

is correctly in O(MN) (Worst-Case), where M and N are the number of rows in the first and second table respectively, since that is an equi-join (Join on a = condition) it compares every row from the first with the second.

However, you also have a second condition. Since SQL has many performance-enhancing algorithms, it is very likely that the student.id = 20 would be evaluated first. Then you would have firstly M (assume linear in the number of rows of the students table) to search for student.id = 20. Then, if student.id = 20 is only constant, let's say m, you would have m * N.

All in all, O(M + (m * N)).

That here depends now on m. If m is constant then in the asymptotic analysis O(M + N) = O(2M), since M=N and you end up with O(M) = O(N) or linear. Otherwise, if m is in Omega(1), then it would be O(M + M * N) or as you assumed O(MN).

Then with regards to the PRIMARY KEY a clustered index will/can be created. The time complexity now for future queries would be as you said O(log K), where K are the rows in the new table (can be != 100).

Now why log K? Because relational databases structure indices as B-trees. Then in WC you get O(log K) in the height of the tree.

More precisely

since on B-trees you have max. 2d children and number of keys s between d - 1 < s < 2d. d is called order, degree or branching factor of the tree.

Hope it helps!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.