0

I have a Table with records like this:

Id ForeignId Date Created Status
1 ZZ01 2021-01-20 failed
2 ZZ02 2021-03-24 passed
3 ZZ01 2021-08-09 passed
4 ZZ03 2022-01-12 failed
5 ZZ01 2022-04-23 passed

I am trying to write a query that returns the latest DateCreated and Status of every Distinct ForeignId in the table above. The query I have been writing uses MAX() to find the latest date of a record, and uses FIRST_VALUE() on Status to get only the latest value as well, and later using GROUP BY all the columns I put in the SELECT except for . The problem is that I I keep getting this error:

Column 'dbo.T.DateSubmitted' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Now I don't understand why this error keeps popping up when I am using all my Selects in the Group By or in the case of the failing DateSubmitted in an Aggregate Function.

SELECT [T].[Id],
       [T].[ForeignId],
       MAX([T].[DateSubmitted]) AS [Date Submitted],
       FIRST_VALUE([S].[Status]) OVER (ORDER BY [T].[DateSubmitted] DESC) AS [Status]
      FROM Table as [T]

GROUP BY [T].[Id], [T].[ForeignId], [T].[Status]
ORDER BY [T].[ForeignId];

As you can see from my code above I am using the MAX() function for my DateSubmitted select, and all my leftover selects in the GROUP BY, what am I missing? Why do I need to include my DateSubmitted in the GROUP BY when I am only selecting the MAX() value?

What I want is to return only the latest date and status for each distinct ForeignId, since every ForeignId can have multiple Status and Dates I want only the latest values, I almost get it with the query above, but if I am unable to use the GROUP BY and MAX() I can get the latest column information I have repeated ForeignId instances. For example I would receive ForeignId ZZ01 3 times instead of once. Which is why I need the GROUP BY to work. An example of the bad output when unable to use GROUP BY:

Id ForeignId Date Created Status
1 ZZ01 2021-04-23 passed
2 ZZ01 2021-04-23 passed
3 ZZ01 2021-04-23 passed
4 ZZ02 2022-03-24 passed
5 ZZ03 2022-01-12 failed

Expected Result:

Id ForeignId Date Created Status
1 ZZ01 2021-04-23 passed
2 ZZ02 2022-03-24 passed
3 ZZ03 2022-01-12 failed
2
  • 2
    It's not aggregated inside of FIRST_VALUE() OVER(). Comment that one out, and the error will disappear. Create a db<>fiddle with some sample data and what value you expect there and maybe we can suggest another way to do it. Commented May 3, 2022 at 21:00
  • Full sample data with all three tables, along with expected results, would help. We need to be able to run your query Commented May 3, 2022 at 21:11

1 Answer 1

1

Unfortunately, FIRST_VALUE is not available as an aggregate function, only as a window function. Therefore, the compiler understands it as operating over the resultset after aggregating. So the columns referred to must be in the GROUP BY, but cannot be on non-aggregated and non-grouped columns.

You can use it over an aggregate function

SELECT [SLM].[Id],
       [SLM].[CompanySiteId],
       MAX([QF].[DateSubmitted]) AS [Date Submitted],
       FIRST_VALUE([QF].[Status]) OVER (ORDER BY MAX([QF].[DateSubmitted]) DESC) AS [QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
    JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
        ON [SAS].[SiteListMemberId] = [SLM].[Id]
    JOIN [dbo].[QCForm] AS [QF]
        ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
GROUP BY [SLM].[Id], [SLM].[CompanySiteId], [QF].[Status]
ORDER BY [SLM].[CompanySiteId];

This may not give the results you want, it's hard to say without sample data.

Or you need to push it down into a derived table (subquery). You can do this over the whole joined set

SELECT t.[Id],
       t.[CompanySiteId],
       MAX(t.[DateSubmitted]) AS [Date Submitted],
       t.[QC Status]
FROM (
    SELECT [SLM].[Id],
           [SLM].[CompanySiteId],
           [QF].[DateSubmitted] AS [Date Submitted],
           FIRST_VALUE([QF].[Status]) OVER (ORDER BY [QF].[DateSubmitted] DESC) AS [QC Status]
    FROM [dbo].[SiteListMember] AS [SLM]
    JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
        ON [SAS].[SiteListMemberId] = [SLM].[Id]
    JOIN [dbo].[QCForm] AS [QF]
        ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
    WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
) t
GROUP BY t.[Id], t.[CompanySiteId], t.[Status]
ORDER BY t.[CompanySiteId];

Or you can do it just over the one table and join it after, by adding a PARTITION BY clause.

SELECT [SLM].[Id],
       [SLM].[CompanySiteId],
       MAX([QF].[DateSubmitted]) AS [Date Submitted],
       QC.[QC Status]
FROM [dbo].[SiteListMember] AS [SLM]
    JOIN [dbo].[SiteAssessmentStaging] AS [SAS]
        ON [SAS].[SiteListMemberId] = [SLM].[Id]
    JOIN (
        SELECT *,
          FIRST_VALUE([QF].[Status]) OVER (PARTITION BY [QF].[SiteAssessmentStagingId]
              ORDER BY [QF].[DateSubmitted] DESC) AS [QC Status]
        FROM [dbo].[QCForm] AS [QF]
      ) ON [QF].[SiteAssessmentStagingId] = [SAS].[Id]
WHERE [SAS].[AssessmentTag] = 'Pre-construction' AND [SLM].[CompanySiteId] = 'ABQ00009B'
GROUP BY [SLM].[Id], [SLM].[CompanySiteId], [QF].[Status]
ORDER BY [SLM].[CompanySiteId];

I would advise you to only quote column and table names which need it, and to avoid such names if at all possible. Lots of [] is annoying to read.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.