How can I filter rows if one array contains all values from another array using BigQuery?

Question

I need to get specific arrays from table in BigQuery. Then I want to reduce rows if some array from partitioned window already has all values of current array and some another values.

with t0 as (SELECT 1 as big_id, '101' as small_id,  0.99 as bottom, 1.03 top
      UNION ALL SELECT 1, '102', 1.05, 1.09
      UNION ALL SELECT 1, '103', 1.09, 1.13
      UNION ALL SELECT 1, '104', 1.2, 1.25
      UNION ALL SELECT 1, '105', 1.33, 1.39
      UNION ALL SELECT 2, '102', 1.05, 1.09
      UNION ALL SELECT 2, '103', 1.09, 1.13
      UNION ALL SELECT 2, '104', 1.2, 1.25
      UNION ALL SELECT 2, '105', 1.33, 1.39)
SELECT t0.big_id, row_number() OVER (PARTITION BY t0.big_id) group_id, ARRAY_AGG(t1.small_id) my_arrays FROM t0
CROSS JOIN t0 t1

WHERE t0.big_id = t1.big_id AND t1.top/t0.bottom BETWEEN 1 AND 1.15
GROUP BY t0.big_id, t0.small_id

I have a table with ids and top and bottom of confidence intervals. I want to compare all unique small_id pairs beginning from small_id with lower bottom. Unique pair means: do not need to compare 102 and 101 if 101 and 102 compared already. Then I need to group small_ids with similar confidence intervals into arrays. Then I need to reduce group if all ids matched in some bigger group in same partitioned window. small_id is not numeric. Just text string. So not possible directly compare small_id as numbers using <>.

These rows what I need to reduce because I got values in another arrays

How I need to modify my query to get expected output?

all those crosses and arrows are really not helping - please clarify what is your expected result in plain view and also what is the logic you are trying to implement in your query — Mikhail Berlyant
– Mikhail Berlyant, Commented Oct 20, 2020 at 18:26
Thank you for reply! I added more details of expected result — Timogavk
– Timogavk, Commented Oct 20, 2020 at 19:17

Mikhail Berlyant · Accepted Answer · 2020-10-21 21:36:33Z

1

Below is for BigQuery Standard SQL

#standardsql
with t0 as (SELECT 1 as big_id, '101' as small_id,  0.99 as bottom, 1.03 top
  UNION ALL SELECT 1, '102', 1.05, 1.09
  UNION ALL SELECT 1, '103', 1.09, 1.13
  UNION ALL SELECT 1, '104', 1.2, 1.25
  UNION ALL SELECT 1, '105', 1.33, 1.39
  UNION ALL SELECT 2, '102', 1.05, 1.09
  UNION ALL SELECT 2, '103', 1.09, 1.13
  UNION ALL SELECT 2, '104', 1.2, 1.25
  UNION ALL SELECT 2, '105', 1.33, 1.39
), temp as (      
  SELECT t0.big_id, 
    row_number() OVER (PARTITION BY t0.big_id) group_id, 
    ARRAY_AGG(t1.small_id) my_arrays FROM t0
  CROSS JOIN t0 t1
  WHERE t0.big_id = t1.big_id AND t1.top/t0.bottom BETWEEN 1 AND 1.15
  GROUP BY t0.big_id, t0.small_id
)
select big_id, group_id, any_value(my_arrays) my_arrays 
from (
  select t1.*,
    ( select count(1)
      from t1.my_arrays id
      join t2.my_arrays id
      using(id)
      where t1.group_id != t2.group_id
    ) = array_length(t1.my_arrays) as flag
  from temp t1 
  join temp t2
  using (big_id)
)
group by big_id, group_id
having countif(flag) = 0

with output

answered Oct 21, 2020 at 21:36

Mikhail Berlyant

174k10 gold badges173 silver badges251 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mikhail Berlyant Over a year ago

glad it worked for you. initially I thought it it purely recursive based logic that can be addressed by either scripting or js udf. but after some sleep over time and realized it can be done in set based way - so, here we go :o)

Timogavk Over a year ago

Thank you. It's looks much better than js. What we are loosing using js udf?

Mikhail Berlyant Over a year ago

usually you would wanted to translate your logic into set-based way so you can use pure SQL that is the most effective! If this is not possible that when you would look for scripting or js udf which have some limitations. so to answer "what we are loosing" i would say - effectiveness of set-based operations

Collectives™ on Stack Overflow

How can I filter rows if one array contains all values from another array using BigQuery?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related