0

I'm doing some performance testing and the StackOverflow2010 DB is great for this.

Looking here for testing multi-column joins SQL WHERE.. IN clause multiple columns

The StackOverflow DB is available here: https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent/

I'm trying to query for Comments for Posts for specific users:

-- get users id < 5
SELECT *
FROM [Users]
WHERE Id < 5;

-- get posts for those users
SELECT *
FROM [Posts]
WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5);

-- get comments for these posts 
SELECT *
FROM [Comments]
WHERE [PostId] IN (SELECT [Id]
                   FROM [Posts]
                   WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5))
  AND [UserId] IN (SELECT [OwnerUserId]
                   FROM [Posts]
                   WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5));

-- THIS version seems to select ALL comments and takes forever!
SELECT *
FROM [Comments]
WHERE EXISTS (SELECT *
              FROM [Posts]
              WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5));

The IN query works but it selects some Comments that the OwnerUserId didn't make (orphans).

The EXISTS query (which seems to be the correct way) seems to select ALL comments and takes forever!

Any ideas? The reason for this is explained here: https://sproket.github.io/Persism/n+1.html

5
  • 1
    Your subquery in the EXISTS isn't correlated. As such it will return every row in the table Comments as the query within the EXISTS is going to find at least one row that exists. In fact, none of your subqueries are correlated. Commented Jan 6, 2022 at 12:19
  • 1
    Have you tried doing actual JOINs instead of EXISTS/IN queries? Also, why AND [UserId] IN (<double nested subquery>) instead of AND [UserId] < 5? Commented Jan 6, 2022 at 12:19
  • @Larnu I added a link to explain the reasoning. It all works fine with single column joins. Commented Jan 6, 2022 at 12:47
  • If the information is related to the question, put it in the question, @sproketboy . Don't require users to visit a link to a Git repository to be able to understand the question . Commented Jan 6, 2022 at 12:48
  • @Larnu OK I'll ask the question again with the additional information. Commented Jan 6, 2022 at 13:05

1 Answer 1

2

You are massively overly complicating the issue here.

Firstly, let's have a look at your second query. This doesn't actually need to reference the Users table at all. You're looking for details of Posts where the User ID is < 5, and the User ID is already in the Posts table, OwnerUserId. So you could just write your second query as:

SELECT *
FROM dbo.Posts
WHERE OwnerUserId < 5;

For your next query, the same applies again, you don't need the subqueries, the ID of the user is already in the Comments table:

SELECT *
FROM dbo.Comments
WHERE UserId < 5;

As for your EXISTS the reason it isn't working is because it isn't correlated. As such the subquery in your EXISTS always returns at least one row, and so the EXISTS evaluates to TRUE; and so every row in the table Comments is returned.

A correlated query would look something like this (this is not relevant to the queries you have, as they are overly complicated):

SELECT {Columns}
FROM dbo.SomeTable ST
WHERE EXISTS (SELECT 1
              FROM dbo.OtherTable OT
              WHERE OT.OtherInt > 10
                AND OT.SomeID = ST.ID);

Note the clause OT.SomeID = ST.ID which means that the tables dbo.SomeTable and dbo.OtherTable have some relationship. As such the above query would only return rows from dbo.SomeTable where a related row existed in dbo.OtherTable and that related row has a value greater than 10 in the column OtherInt.

Sign up to request clarification or add additional context in comments.

2 Comments

The reason for the complexity is that these queries need to be generalized as part my Persism library. Using single column joins with IN allows me to get child rows quickly while avoiding the SELECT N+1 problem.
Like I mentioned, those subqueries are simply not required in any of the above attempts, @sproketboy . Perhaps the problem is that you have used a poor example to demonstrate what you are actually trying to achieve and that was resulted in you writing incorrect and overly complicated SQL. If so, you may well need to ask a new question, detailing your actual problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.