I'm doing some performance testing and the StackOverflow2010 DB is great for this.
Looking here for testing multi-column joins SQL WHERE.. IN clause multiple columns
The StackOverflow DB is available here: https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent/
I'm trying to query for Comments for Posts for specific users:
-- get users id < 5
SELECT *
FROM [Users]
WHERE Id < 5;
-- get posts for those users
SELECT *
FROM [Posts]
WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5);
-- get comments for these posts
SELECT *
FROM [Comments]
WHERE [PostId] IN (SELECT [Id]
FROM [Posts]
WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5))
AND [UserId] IN (SELECT [OwnerUserId]
FROM [Posts]
WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5));
-- THIS version seems to select ALL comments and takes forever!
SELECT *
FROM [Comments]
WHERE EXISTS (SELECT *
FROM [Posts]
WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5));
The IN query works but it selects some Comments that the OwnerUserId didn't make (orphans).
The EXISTS query (which seems to be the correct way) seems to select ALL comments and takes forever!
Any ideas? The reason for this is explained here: https://sproket.github.io/Persism/n+1.html
EXISTSisn't correlated. As such it will return every row in the tableCommentsas the query within theEXISTSis going to find at least one row that exists. In fact, none of your subqueries are correlated.AND [UserId] IN (<double nested subquery>)instead ofAND [UserId] < 5?