I have this table in SQL server :
| date | var | val |
|---|---|---|
| 2022-2-1 | A | 1.1 |
| 2022-3-1 | A | 2.3 |
| 2022-4-1 | A | 1.5 |
| 2022-5-1 | A | 1.7 |
| 2022-09-1 | B | 1.8 |
| 2022-10-1 | B | 1.9 |
| 2022-11-1 | B | 2.1 |
| 2022-12-1 | B | 2.22 |
I want to group by column var and date and implement the expanding average and standard deviation in order to create an upper and lower interval of one standard deviation. At the end I want each entry in column val (apart from the first one) to be checked if falls inside the interval of the previous time (or lag 1).
How can I do it in SQL Server?
My attempt
-- Create the table
CREATE TABLE DataTable (
[date] DATE,
var CHAR(1),
val DECIMAL(4, 2)
);
-- Insert the data
INSERT INTO DataTable ([date], var, val)
VALUES
('2022-02-01', 'A', 1.1),
('2022-03-01', 'A', 2.3),
('2022-04-01', 'A', 1.5),
('2022-05-01', 'A', 1.7),
('2022-09-01', 'B', 1.8),
('2022-10-01', 'B', 1.9),
('2022-11-01', 'B', 2.1),
('2022-12-01', 'B', 2.22);
WITH DataWithStats AS (
SELECT
[date],
var,
val,
AVG(val) OVER (PARTITION BY var ORDER BY [date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS exp_avg,
STDEV(val) OVER (PARTITION BY var ORDER BY [date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS exp_std
FROM DataTable
)
SELECT
[date],
var,
exp_avg - 2 * exp_std AS lower,
val,
exp_avg + 2 * exp_std AS upper,
CASE
WHEN val < LAG(exp_avg - 2 * exp_std, 1) OVER (PARTITION BY var ORDER BY [date]) OR
val > LAG(exp_avg + 2 * exp_std, 1) OVER (PARTITION BY var ORDER BY [date])
THEN 'Warning'
ELSE 'ok'
END AS status
FROM DataWithStats;
The problem with my attempt is that the check column does not properly evaluate the check.
https://sqlfiddle.com/sql-server/online-compiler?id=fe2e764f-9627-418f-bd08-dd13250b0dab
I want to group by column var and date [...]is not what I'm seeing in your code sample. Also, what column is thecheckcolumn? Be more clear please.exp_avg +/- 2 * exp_std.