0

I have 4 tables in a database. The warehouse contains boxes owned by clients, and the boxes have files in them. There is a Client table, a Warehouse table, a Boxes table, and a Files table.

So the Client table has WarehouseID as a foreign key, the Boxes table has ClientID as a foreign key, and the Files table has BoxID as a foreign key. I want to count the number of boxes and files that each client has in my query, as well as the number of boxes that are in and out of the warehouse. A Status field on the Boxes and Files tables determines if the boxes and files are in or out of the warehouse. I run the following query on the boxes and the numbers are correct:

SELECT
    [c].[ClientID],
    [c].[Name] AS [ClientName],
    [w].[Name] AS [WarehouseName],
    COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
    SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
    SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
    SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
  ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
  ON [c].WarehouseID = [w].[WarehouseID]
WHERE [c].[ClientID] = @ClientID
GROUP BY
  [c].[ClientID],
  [c].[Name],
  [w].[Name]

This produces the output of:

ClientID | ClientName | WarehouseName | BoxCount | BoxesIn | BoxesOut | BoxesForDestruction
1        | ACME Corp. | FooFactory    | 22744    | 22699   | 45       | 7888 

The output of the count is correct. When I add the Files table to the INNER JOIN then the numbers get inflated. Here is the SQL:

SELECT
    [c].[ClientID],
    [c].[Name] AS [ClientName],
    [w].[Name] AS [WarehouseName],
    COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
    COUNT(DISTINCT [f].[FileID]) AS [FileCount], -- *NEW*
    SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
    SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
    SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
  ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
  ON [c].[WarehouseID] = [w].[WarehouseID]
INNER JOIN [Files] AS [f]      -- *NEW*
  ON [b].[BoxID] = [f].[BoxID] -- *NEW*
WHERE [c].[ClientID] = @ClientID
GROUP BY
  [c].[ClientID],
  [c].[Name],
  [w].[Name]

This gives me the count output below (I've omitted the first 3 columns since they're not relevant):

BoxCount | FilesCount | BoxesIn | BoxesOut | BoxesForDestruction 
19151    | 411961     | 411381  | 580      | 144615       

The FilesCount is correct, but the other numbers are off. I know why this is happening, but I'm not sure how to fix it. The extra rows are created due to the multiple rows returned by the join on the boxes and files. When performing the SUM, the extra rows inflate the count. Since there is only one row for the warehouse, that join doesn't affect the count. How do I modify my query to get the correct number of files and boxes in and out of the warehouse?

2 Answers 2

2

A join repeats each row in the left hand table for each row in the right hand table. If you combine multiple joins some rows will be double counted. A solution is to move the count to a subquery. For example:

select  *
from    table1 t1
join    (
        select  table1_id
        ,       count(*)
        from    table2
        group by
                table1_id
        ) t2
on      t2.table1_id = t1.id
join    (
        select  table1_id
        ,       count(*)
        from    table3
        group by
                table1_id
        ) t3
on      t3.table1_id = t1.id
Sign up to request clarification or add additional context in comments.

Comments

0

As mentioned by Andomar, I included "as myColumnOne" and "myColumnTwo" besides Count(*), as it is required on SQL Server 2018:

 select  *
from    table1 t1
join    (
        select  table1_id
        ,       count(*) as myColumnOne
        from    table2
        group by
                table1_id
        ) t2
on      t2.table1_id = t1.id
join    (
        select  table1_id
        ,       count(*) as myColumnTwo
        from    table3
        group by
                table1_id
        ) t3
on      t3.table1_id = t1.id

4 Comments

There is no SQL Server 2018 version - we have 2012, 2014, 2016, 2017 and 2019 - take your pick
v17.9.1, and the year that appears is 2018. The point is that you need to include the column name for this to work in new versions. =)
v17.9.1 is the version of SQL Server Management Studio - the GUI admin tool - NOT the SQL Server database engine (which is what really counts when it comes to knowing what features are available - or not yet - for a given problem). Run SELECT @@VERSION against your database engine - THAT is the version we need to know about - not what management tool version you're using .....
Thank you. The version is: Microsoft SQL Azure (RTM) - 12.0.2000.8 Oct 1 2020 18:48:35 Copyright (C) 2019 Microsoft Corporation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.