0

I wanted to use BigQuery's array aggregate function, array_concat_agg over a window function. It doesn't look like this is possible - and actually, I might have a second issue regarding the accessibility of my window function to an inner query.

Here is my current SQL:

WITH data AS (
  SELECT 1 AS id, 1 AS iteration_recency, DATE("2022-07-09") AS start_date, DATE("2022-07-31") AS end_date
  UNION ALL
  SELECT 1 AS id, 2 AS iteration_recency, DATE("2022-08-01") AS start_date, DATE("2022-08-15") AS end_date
  UNION ALL
  SELECT 1 AS id, 3 AS iteration_recency, DATE("2022-07-01") AS start_date, DATE("2022-07-04") AS end_date
  UNION ALL
  SELECT 1 AS id, 4 AS iteration_recency, DATE("2022-07-25") AS start_date, DATE("2022-08-04") AS end_date
  UNION ALL
  SELECT 1 AS id, 5 AS iteration_recency, DATE("2022-07-01") AS start_date, DATE("2022-07-31") AS end_date
  UNION ALL
  SELECT 2 AS id, 1 AS iteration_recency, DATE("2022-08-01") AS start_date, DATE("2022-10-30") AS end_date
  UNION ALL
  SELECT 2 AS id, 2 AS iteration_recency, DATE("2022-07-05") AS start_date, DATE("2022-07-22") AS end_date
  UNION ALL
  SELECT 2 AS id, 3 AS iteration_recency, DATE("2022-08-06") AS start_date, DATE("2022-08-24") AS end_date
)

SELECT 
  id, 
  iteration_recency, 
  (
    SELECT MIN(`dates` IN UNNEST(
      ARRAY_CONCAT_AGG(GENERATE_DATE_ARRAY(`start_date`, `end_date`)) OVER `newer_iterations`
      )
    )
    FROM UNNEST(GENERATE_DATE_ARRAY(`start_date`, `end_date`)) AS `dates`
  ) AS date_range_contained_in_more_recent_iterations
FROM data
WINDOW `newer_iterations` AS (
  PARTITION BY id
  ORDER BY iteration ASC
  ROWS BETWEEN 1 PRECEDING AND UNBOUNDED PRECEDING
)

The purpose of this query is to determine whether a date range is fully represented by more recent iterations of the same id. Regarding the use case, you can imagine this would be used in some monitoring whereby when iteration 3 fails for some date range but that date range is covered by more recent iteration(s), it's not a problem. I can't do something clever with min/max because more recent iterations may have overlapped the failed date range but perhaps not completely covered it between them.

The slightly crazy MIN in UNEST() stuff draws inspiration from this answer which provides a neat way for working out if all items from arrayA are in arrayB.

Currently, I get the error, Unrecognized window alias newer_iterations at [24:76] I was actually expecting to get the error (paraphrased), "the OVER clause is not supported for ARRAY_CONCAT_AGG" because according to the docs, it is not - but it looks like I'm misunderstanding the availability of the outer window function in the inner function.

Maybe there's a way of doing this with a join but I think the logical requirement of ARRAY_CONCAT_AGG operating over a frame clause of 1 ROW PRECEDING AND UNBOUNDED PRECEDING seems unavoidable to me.

The result I was expecting was:

id iteration_recency date_range_contained_in_more_recent_iterations
1 1 false
1 2 false
1 3 false
1 4 true
1 5 false
2 1 false
2 2 false
2 3 true

P.s. this table renders fine in the edit question window - no idea what is wrong with it so if someone can edit my text to fix, I'd be grateful.

Grateful for any pointers, thanks!

2 Answers 2

0

You might consider below.

SELECT * EXCEPT(days, concat_days), NOT EXISTS (
         SELECT d FROM t.days d -- current iteration
         EXCEPT DISTINCT  -- MINUS 
         SELECT d FROM t.concat_days c, c.days d -- more recent iterations
       ) date_range_contained_in_more_recent_iterations
  FROM (
    SELECT *, ARRAY_AGG(STRUCT(days)) OVER w0 AS concat_days  
      FROM data, UNNEST([STRUCT(GENERATE_DATE_ARRAY(start_date, end_date) AS days)])
    WINDOW w0 AS (PARTITION BY id ORDER BY iteration_recency ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
  ) AS t
;

Query results

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the answer. I'm keen to see if this can be achieved exclusively with window functions and conditional logic. The joins and subqueries are very "heavy" in comparison. You have 2 cross joins and an except distinct which are going to totally change the query plan - but I'm grateful for the suggestion.
0

I found an alternative which draws inspiration from the combined use of ARRAY_AGG and STRUCT from Jaytiger's answer:

WITH data AS (
  SELECT *, ARRAY_AGG(STRUCT(dates)) OVER w0 AS concat_dates
  FROM (
    SELECT 1 AS id, 1 AS iteration_recency, GENERATE_DATE_ARRAY("2022-07-09", "2022-07-31") AS dates
    UNION ALL
    SELECT 1 AS id, 2 AS iteration_recency, GENERATE_DATE_ARRAY("2022-08-01", "2022-08-15") AS dates
    UNION ALL
    SELECT 1 AS id, 3 AS iteration_recency, GENERATE_DATE_ARRAY("2022-07-01", "2022-07-04") AS dates
    UNION ALL
    SELECT 1 AS id, 4 AS iteration_recency, GENERATE_DATE_ARRAY("2022-07-25", "2022-08-04") AS dates
    UNION ALL
    SELECT 1 AS id, 5 AS iteration_recency, GENERATE_DATE_ARRAY("2022-07-01", "2022-07-31") AS dates
    UNION ALL
    SELECT 2 AS id, 1 AS iteration_recency, GENERATE_DATE_ARRAY("2022-08-01", "2022-10-30") AS dates
    UNION ALL
    SELECT 2 AS id, 2 AS iteration_recency, GENERATE_DATE_ARRAY("2022-07-05", "2022-07-22") AS dates
    UNION ALL
    SELECT 2 AS id, 3 AS iteration_recency, GENERATE_DATE_ARRAY("2022-08-06", "2022-08-24") AS dates
  )
  WINDOW w0 AS (PARTITION BY id ORDER BY iteration_recency ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
SELECT 
  *, (
    SELECT MIN(d IN UNNEST(concat_dates.dates))
    FROM UNNEST(dates) AS d
  )
FROM data

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.