I have 4 tables in a database. The warehouse contains boxes owned by clients, and the boxes have files in them. There is a Client table, a Warehouse table, a Boxes table, and a Files table.
So the Client table has WarehouseID as a foreign key, the Boxes table has ClientID as a foreign key, and the Files table has BoxID as a foreign key. I want to count the number of boxes and files that each client has in my query, as well as the number of boxes that are in and out of the warehouse. A Status field on the Boxes and Files tables determines if the boxes and files are in or out of the warehouse. I run the following query on the boxes and the numbers are correct:
SELECT
[c].[ClientID],
[c].[Name] AS [ClientName],
[w].[Name] AS [WarehouseName],
COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
ON [c].WarehouseID = [w].[WarehouseID]
WHERE [c].[ClientID] = @ClientID
GROUP BY
[c].[ClientID],
[c].[Name],
[w].[Name]
This produces the output of:
ClientID | ClientName | WarehouseName | BoxCount | BoxesIn | BoxesOut | BoxesForDestruction
1 | ACME Corp. | FooFactory | 22744 | 22699 | 45 | 7888
The output of the count is correct. When I add the Files table to the INNER JOIN then the numbers get inflated. Here is the SQL:
SELECT
[c].[ClientID],
[c].[Name] AS [ClientName],
[w].[Name] AS [WarehouseName],
COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
COUNT(DISTINCT [f].[FileID]) AS [FileCount], -- *NEW*
SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
ON [c].[WarehouseID] = [w].[WarehouseID]
INNER JOIN [Files] AS [f] -- *NEW*
ON [b].[BoxID] = [f].[BoxID] -- *NEW*
WHERE [c].[ClientID] = @ClientID
GROUP BY
[c].[ClientID],
[c].[Name],
[w].[Name]
This gives me the count output below (I've omitted the first 3 columns since they're not relevant):
BoxCount | FilesCount | BoxesIn | BoxesOut | BoxesForDestruction
19151 | 411961 | 411381 | 580 | 144615
The FilesCount is correct, but the other numbers are off. I know why this is happening, but I'm not sure how to fix it. The extra rows are created due to the multiple rows returned by the join on the boxes and files. When performing the SUM, the extra rows inflate the count. Since there is only one row for the warehouse, that join doesn't affect the count. How do I modify my query to get the correct number of files and boxes in and out of the warehouse?