Apply expanding calculations on SQL Server

Question

I have this table in SQL server :

date	var	val
2022-2-1	A	1.1
2022-3-1	A	2.3
2022-4-1	A	1.5
2022-5-1	A	1.7
2022-09-1	B	1.8
2022-10-1	B	1.9
2022-11-1	B	2.1
2022-12-1	B	2.22

I want to group by column var and date and implement the expanding average and standard deviation in order to create an upper and lower interval of one standard deviation. At the end I want each entry in column val (apart from the first one) to be checked if falls inside the interval of the previous time (or lag 1).

How can I do it in SQL Server?

My attempt

-- Create the table
CREATE TABLE DataTable (
    [date] DATE,
    var CHAR(1),
    val DECIMAL(4, 2)
);

-- Insert the data
INSERT INTO DataTable ([date], var, val)
VALUES 
('2022-02-01', 'A', 1.1),
('2022-03-01', 'A', 2.3),
('2022-04-01', 'A', 1.5),
('2022-05-01', 'A', 1.7),
('2022-09-01', 'B', 1.8),
('2022-10-01', 'B', 1.9),
('2022-11-01', 'B', 2.1),
('2022-12-01', 'B', 2.22);


WITH DataWithStats AS (
    SELECT 
        [date],
        var,
        val,
        AVG(val) OVER (PARTITION BY var ORDER BY [date] 
                       ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS exp_avg,
        STDEV(val) OVER (PARTITION BY var ORDER BY [date] 
                          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS exp_std
    FROM DataTable
)
SELECT 
    [date],
    var,
    exp_avg - 2 * exp_std AS lower,
    val,
    exp_avg + 2 * exp_std AS upper,
    CASE 
        WHEN val < LAG(exp_avg - 2 * exp_std, 1) OVER (PARTITION BY var ORDER BY [date]) OR  
             val > LAG(exp_avg + 2 * exp_std, 1) OVER (PARTITION BY var ORDER BY [date]) 
        THEN 'Warning'
        ELSE 'ok'
    END AS status
FROM DataWithStats;

The problem with my attempt is that the check column does not properly evaluate the check.

https://sqlfiddle.com/sql-server/online-compiler?id=fe2e764f-9627-418f-bd08-dd13250b0dab

I want to group by column var and date [...] is not what I'm seeing in your code sample. Also, what column is the check column? Be more clear please. — TT.
– TT., Commented Oct 12, 2024 at 16:20
What is incorrect? If you believe the "warning" status is incorrect because 2.10 is not outside the 1.62782795366962 - 2.23883804633038 range, your problem is that your displayed values are not the ones used to set the status. The LAG() average and standard deviation values are use. The 2.10 value is outside the actual 1.7085786437627 - 1.9914213562373 range used by the test. See this db<>fiddle. — T N
– T N, Commented Oct 12, 2024 at 18:25
Side note: Your description states "an upper and lower interval of one standard deviation", but your calculation appears to be calculating and interval of two standard deviations - exp_avg +/- 2 * exp_std. — T N
– T N, Commented Oct 12, 2024 at 18:30
"I want each entry in column val (apart from the first one) to be checked..." - If only using prior values to calculate average and standard deviation, you will need at least two prior values before you can calculate a non-zero standard deviation. — T N
– T N, Commented Oct 12, 2024 at 20:17

T N · Accepted Answer · 2024-10-13 02:30:40Z

If you believe the "warning" status is incorrect in the query results, because 2.10 is not outside the displayed 1.62782795366962 - 2.23883804633038 range, then I believe the issue that your displayed values are not the ones used to calculate the status. The LAG() average and standard deviation values are used instead of the current values. The 2.10 value is outside the actual 1.7085786437627 - 1.9914213562373 range used by the test.

The results were correct (as coded), but the presentation was misleading.

Original results:

date	var	lower	val	upper	status
2022-02-01	A	null	1.10	null	ok
2022-03-01	A	0.00294372515228614	2.30	3.39705627484771	ok
2022-04-01	A	0.411312814678444	1.50	2.85535318532156	ok
2022-05-01	A	0.650000000000001	1.70	2.65	ok
2022-09-01	B	null	1.80	null	ok
2022-10-01	B	1.7085786437627	1.90	1.9914213562373	ok
2022-11-01	B	1.62782795366962	2.10	2.23883804633038	Warning
2022-12-01	B	1.62500000000001	2.22	2.38499999999999	ok

Results showing actual range used by status calculation.

date	var	lower	val	upper	status
2022-02-01	A	null	1.10	null	ok
2022-03-01	A	null	2.30	null	ok
2022-04-01	A	0.00294372515228614	1.50	3.39705627484771	ok
2022-05-01	A	0.411312814678444	1.70	2.85535318532156	ok
2022-09-01	B	null	1.80	null	ok
2022-10-01	B	null	1.90	null	ok
2022-11-01	B	1.7085786437627	2.10	1.9914213562373	Warning
2022-12-01	B	1.62782795366962	2.22	2.23883804633038	ok

Note that STDEV() will be null and no range will be calculated until there are at least two prior values.

The following query modifies the window range in the CTE, so that LAG() is not needed in the final query. The displayed range limit values are the same as are used in the test.

WITH DataWithStats AS (
    SELECT 
        [date],
        var,
        val,
        AVG(val) OVER (PARTITION BY var ORDER BY [date] 
                       ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS exp_avg,
        STDEV(val) OVER (PARTITION BY var ORDER BY [date] 
                          ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS exp_std
    FROM DataTable
)
SELECT 
    [date],
    var,
    exp_avg - 2 * exp_std AS lower,
    val,
    exp_avg + 2 * exp_std AS upper,
    CASE 
        WHEN val < exp_avg - 2 * exp_std OR  
             val > exp_avg + 2 * exp_std
        THEN 'Warning'
        ELSE 'ok'
    END AS status
FROM DataWithStats;

See this db<>fiddle.

Collectives™ on Stack Overflow

Apply expanding calculations on SQL Server

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related