SQL COUNT on 3 tables with JOIN

Question

I have 4 tables in a database. The warehouse contains boxes owned by clients, and the boxes have files in them. There is a Client table, a Warehouse table, a Boxes table, and a Files table.

So the Client table has WarehouseID as a foreign key, the Boxes table has ClientID as a foreign key, and the Files table has BoxID as a foreign key. I want to count the number of boxes and files that each client has in my query, as well as the number of boxes that are in and out of the warehouse. A Status field on the Boxes and Files tables determines if the boxes and files are in or out of the warehouse. I run the following query on the boxes and the numbers are correct:

SELECT
    [c].[ClientID],
    [c].[Name] AS [ClientName],
    [w].[Name] AS [WarehouseName],
    COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
    SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
    SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
    SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
  ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
  ON [c].WarehouseID = [w].[WarehouseID]
WHERE [c].[ClientID] = @ClientID
GROUP BY
  [c].[ClientID],
  [c].[Name],
  [w].[Name]

This produces the output of:

ClientID | ClientName | WarehouseName | BoxCount | BoxesIn | BoxesOut | BoxesForDestruction
1        | ACME Corp. | FooFactory    | 22744    | 22699   | 45       | 7888

The output of the count is correct. When I add the Files table to the INNER JOIN then the numbers get inflated. Here is the SQL:

SELECT
    [c].[ClientID],
    [c].[Name] AS [ClientName],
    [w].[Name] AS [WarehouseName],
    COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
    COUNT(DISTINCT [f].[FileID]) AS [FileCount], -- *NEW*
    SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
    SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
    SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
  ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
  ON [c].[WarehouseID] = [w].[WarehouseID]
INNER JOIN [Files] AS [f]      -- *NEW*
  ON [b].[BoxID] = [f].[BoxID] -- *NEW*
WHERE [c].[ClientID] = @ClientID
GROUP BY
  [c].[ClientID],
  [c].[Name],
  [w].[Name]

This gives me the count output below (I've omitted the first 3 columns since they're not relevant):

BoxCount | FilesCount | BoxesIn | BoxesOut | BoxesForDestruction 
19151    | 411961     | 411381  | 580      | 144615

The FilesCount is correct, but the other numbers are off. I know why this is happening, but I'm not sure how to fix it. The extra rows are created due to the multiple rows returned by the join on the boxes and files. When performing the SUM, the extra rows inflate the count. Since there is only one row for the warehouse, that join doesn't affect the count. How do I modify my query to get the correct number of files and boxes in and out of the warehouse?

Andomar · Accepted Answer · 2017-08-11 15:06:51Z

2

A join repeats each row in the left hand table for each row in the right hand table. If you combine multiple joins some rows will be double counted. A solution is to move the count to a subquery. For example:

select  *
from    table1 t1
join    (
        select  table1_id
        ,       count(*)
        from    table2
        group by
                table1_id
        ) t2
on      t2.table1_id = t1.id
join    (
        select  table1_id
        ,       count(*)
        from    table3
        group by
                table1_id
        ) t3
on      t3.table1_id = t1.id

answered Aug 11, 2017 at 15:06

Andomar

239k55 gold badges387 silver badges412 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Camila Ferreira Ribeiro · Accepted Answer · 2020-11-25 00:44:15Z

0

As mentioned by Andomar, I included "as myColumnOne" and "myColumnTwo" besides Count(*), as it is required on SQL Server 2018:

 select  *
from    table1 t1
join    (
        select  table1_id
        ,       count(*) as myColumnOne
        from    table2
        group by
                table1_id
        ) t2
on      t2.table1_id = t1.id
join    (
        select  table1_id
        ,       count(*) as myColumnTwo
        from    table3
        group by
                table1_id
        ) t3
on      t3.table1_id = t1.id

answered Nov 25, 2020 at 0:44

Camila Ferreira Ribeiro

11

4 Comments

marc_s Over a year ago

There is no SQL Server 2018 version - we have 2012, 2014, 2016, 2017 and 2019 - take your pick

Camila Ferreira Ribeiro Over a year ago

v17.9.1, and the year that appears is 2018. The point is that you need to include the column name for this to work in new versions. =)

marc_s Over a year ago

v17.9.1 is the version of SQL Server Management Studio - the GUI admin tool - NOT the SQL Server database engine (which is what really counts when it comes to knowing what features are available - or not yet - for a given problem). Run SELECT @@VERSION against your database engine - THAT is the version we need to know about - not what management tool version you're using .....

Camila Ferreira Ribeiro Over a year ago

Collectives™ on Stack Overflow

SQL COUNT on 3 tables with JOIN

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related