1

I have a table like this one....

CREATE TABLE AbsentStudents
(
    Id int not null primary key identity(1,1),
    StudentId int not null,
    AbsentDate datetime not null
)

This is a very large table that has 1 row for each student for each day that they were absent.

I have been asked to write a stored procedure that gets student absences by date range. What makes this query tricky is that I have to filter/aggregate by "absence episodes". The number of days that constitutes an "absence episode" is a procedure parameter so it can vary.

So for example, I need to get a list of students who were absent between 1/1/2016 to 1/17/2016 but only if they were absent for more than @Days (2 or 3 or whatever the parameter dictates) days.

I think that alone I could figure out. However, within the date range a student can have more than one "absence episode". So a student might have been absent for 3 days at the beginning of the date range, 2 days in the middle of the date range, and 4 days at the end of the date range and each of those constitutes a different "absence episodes". Assuming that my @Days parameter is 2, that should return 3 rows for that student. And, each returned row should calculate how many days the student was absent for that "absence episode."

So I would like my procedure require 3 parameters (@StartDate datetime,@EndDate datetime, @Days int) and return something like this...

StudentId, InitialAbsentDate, ConsecutiveDaysMissed

And ideally it would do this using a SET operation and avoid cursors. (Although cursors are fine if that is the only option.)

UPDATE (by Shnugo)

A test scenario

DECLARE @AbsentStudents TABLE(
    Id int not null primary key identity(1,1),
    StudentId int not null,
    AbsentDate datetime not null
);
INSERT INTO @AbsentStudents VALUES
--student 1
 (1,{d'2016-10-01'}),(1,{d'2016-10-02'}),(1,{d'2016-10-03'}) --three days 
,(1,{d'2016-10-05'}) --one day
,(1,{d'2016-10-07'}),(1,{d'2016-10-08'}) --two days
--student 2
,(2,{d'2016-10-01'}),(2,{d'2016-10-02'}),(2,{d'2016-10-03'}),(2,{d'2016-10-04'}) --four days
,(2,{d'2016-10-08'}),(2,{d'2016-10-09'}),(2,{d'2016-10-10'}) --three days
,(2,{d'2016-10-12'}); --one day

DECLARE @startDate DATETIME={d'2016-10-01'};
DECLARE @endDate DATETIME={d'2016-10-31'};
DECLARE @Days INT = 3;
4
  • 2
    Take a look at this article. What you need is groups of contiguous dates. sqlservercentral.com/articles/T-SQL/71550 Commented Nov 8, 2016 at 19:25
  • You would help us a lot if you prepared a MCVE. Please use DECLARE @AbsentStudents TABLE... and INSERT INTO @AbsentStudents VALUES... to provide copy'N'pasteable sample data. Show what you've tried so far and the expected output. Commented Nov 8, 2016 at 19:25
  • Which version of SQL Server? Commented Nov 8, 2016 at 19:38
  • @Shnugo Sorry, I should have said...SQL12. Weekend and holidays don't matter. Commented Nov 8, 2016 at 19:42

3 Answers 3

4

If you just want periods of times when students are absent, you can do this with a difference of row numbers approach.

Now, the following assumes that days are sequential with no gaps and uses the difference of row numbers to get periods of absences:

select student_id, 
       min(AbsentDate), 
       max(AbsentDate), 
       count(*) as number_of_days
from (select a.*,
             row_number() over (partition by student_id order by AbsentDate) as seqnum_sa
      from AbsentStudents a
     ) a
group by student_id, 
         dateadd(day, - seqnum_sa, AbsentDate);

Notes:

  • You have additional requirements on minimum days and date ranges. These are easily handled with a where clause.
  • I suspect you have a hidden requirement on avoiding week ends an holidays. Neither this (nor other answers) cover this. Ask another question if this is an issue.
Sign up to request clarification or add additional context in comments.

4 Comments

Wow...this totally worked! I love the simplicity. No need to avoid holidays so I am giving you the answer.
...and it is incredibly fast...it runs in less than a second against my several hundred thousand data set. Genius.
@DForck42 . . . I document my indentation style in Chapter 1 of Data Analysis Using SQL and Excel. Please do not change my formatted code.
and yet you didn't revert the change, but kept most of the changes??
3

You can try this query:

SELECT
    StudentId
    , MIN(AbsentDate) AS InitialDate
    , COUNT(*) AS ConsecutiveDaysMissed
FROM (
SELECT 
    dateNumber - ROW_NUMBER() OVER(PARTITION BY StudentId ORDER BY dateNumber) AS PeriodId
    , AbsentDate
    , StudentId
FROM(
        SELECT
            StudentId
            , AbsentDate
            , CAST(CONVERT(CHAR(8), AbsentDate, 112) AS INT) AS dateNumber
        FROM AbsentStudents
        WHERE AbsentDate BETWEEN @StartDate AND @EndDate
    ) AS T
) AS StudentPeriod
GROUP BY StudentID, PeriodId

Well, you can make a table with dates and their order numbers without holidays and weekends. Then make the join with AbsentStudents by date and use order number instead of CAST(CONVERT(CHAR(8), AbsentDate, 112) AS INT) AS dateNumber.

7 Comments

If someone had two different absences of 1 day this would count that as two -- you need another group by and a filter against @Days
@Hogan You can use distinct by 'AbsentDate' and 'StudentId' , of course, before execute this query and after add filter. Do you suggest to provide all constraints?
I like this answer if weekends or holidays dont matter
Weekends or holidays don't matter. My ACTUAL application is in healthcare but wanted to choose something more generic than what I need.
@JamieD77 Well, you can make a table with dates and their order numbers without holidays and weekends. Then make the join with AbsentStudents by date and use order number instead of CAST(CONVERT(CHAR(8), AbsentDate, 112) AS INT) AS dateNumber.
|
1

You can use a trick. If you order by date, you can find date groups by subtracting the number of days from smallest element and adding a counter that goes up by one every row.

SELECT StudentID 
FROM (
  SELECT StudentID, GROUP_NUM, COUNT(*) AS GROUP_DAY_CNT
  FROM (
    SELECT StudentId,
           DATEDIFF(dd,DATEADD(dd,M.Min, ROW_NUMBER() OVER (ORDER BY  AbsetntDate),AbsentDate) as GROUP_NUM
    FROM AbsentStudent
    CROSS JOIN (SELECT MIN(AbsentDate) as Min FROM AbsentStudents WHERE  AbsentDate BETWEEN @StartDate AND @EndDate) M
    WHERE AbsentDate BETWEEN @StartDate AND @EndDate
  ) X
  GROUP BY  StudentID, GROUP_NUM
) Z
WHERE GROUP_DAY_CNT >= @Days

1 Comment

needs to return StudentId, InitialAbsentDate, ConsecutiveDaysMissed also i believe dateadd is dateadd(datapart,number,date)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.