I wanted to use BigQuery's array aggregate function, array_concat_agg over a window function. It doesn't look like this is possible - and actually, I might have a second issue regarding the accessibility of my window function to an inner query.
Here is my current SQL:
WITH data AS (
SELECT 1 AS id, 1 AS iteration_recency, DATE("2022-07-09") AS start_date, DATE("2022-07-31") AS end_date
UNION ALL
SELECT 1 AS id, 2 AS iteration_recency, DATE("2022-08-01") AS start_date, DATE("2022-08-15") AS end_date
UNION ALL
SELECT 1 AS id, 3 AS iteration_recency, DATE("2022-07-01") AS start_date, DATE("2022-07-04") AS end_date
UNION ALL
SELECT 1 AS id, 4 AS iteration_recency, DATE("2022-07-25") AS start_date, DATE("2022-08-04") AS end_date
UNION ALL
SELECT 1 AS id, 5 AS iteration_recency, DATE("2022-07-01") AS start_date, DATE("2022-07-31") AS end_date
UNION ALL
SELECT 2 AS id, 1 AS iteration_recency, DATE("2022-08-01") AS start_date, DATE("2022-10-30") AS end_date
UNION ALL
SELECT 2 AS id, 2 AS iteration_recency, DATE("2022-07-05") AS start_date, DATE("2022-07-22") AS end_date
UNION ALL
SELECT 2 AS id, 3 AS iteration_recency, DATE("2022-08-06") AS start_date, DATE("2022-08-24") AS end_date
)
SELECT
id,
iteration_recency,
(
SELECT MIN(`dates` IN UNNEST(
ARRAY_CONCAT_AGG(GENERATE_DATE_ARRAY(`start_date`, `end_date`)) OVER `newer_iterations`
)
)
FROM UNNEST(GENERATE_DATE_ARRAY(`start_date`, `end_date`)) AS `dates`
) AS date_range_contained_in_more_recent_iterations
FROM data
WINDOW `newer_iterations` AS (
PARTITION BY id
ORDER BY iteration ASC
ROWS BETWEEN 1 PRECEDING AND UNBOUNDED PRECEDING
)
The purpose of this query is to determine whether a date range is fully represented by more recent iterations of the same id. Regarding the use case, you can imagine this would be used in some monitoring whereby when iteration 3 fails for some date range but that date range is covered by more recent iteration(s), it's not a problem. I can't do something clever with min/max because more recent iterations may have overlapped the failed date range but perhaps not completely covered it between them.
The slightly crazy MIN in UNEST() stuff draws inspiration from this answer which provides a neat way for working out if all items from arrayA are in arrayB.
Currently, I get the error, Unrecognized window alias newer_iterations at [24:76] I was actually expecting to get the error (paraphrased), "the OVER clause is not supported for ARRAY_CONCAT_AGG" because according to the docs, it is not - but it looks like I'm misunderstanding the availability of the outer window function in the inner function.
Maybe there's a way of doing this with a join but I think the logical requirement of ARRAY_CONCAT_AGG operating over a frame clause of 1 ROW PRECEDING AND UNBOUNDED PRECEDING seems unavoidable to me.
The result I was expecting was:
| id | iteration_recency | date_range_contained_in_more_recent_iterations |
|---|---|---|
| 1 | 1 | false |
| 1 | 2 | false |
| 1 | 3 | false |
| 1 | 4 | true |
| 1 | 5 | false |
| 2 | 1 | false |
| 2 | 2 | false |
| 2 | 3 | true |
P.s. this table renders fine in the edit question window - no idea what is wrong with it so if someone can edit my text to fix, I'd be grateful.
Grateful for any pointers, thanks!
