1

I need to get specific arrays from table in BigQuery. Then I want to reduce rows if some array from partitioned window already has all values of current array and some another values.

with t0 as (SELECT 1 as big_id, '101' as small_id,  0.99 as bottom, 1.03 top
      UNION ALL SELECT 1, '102', 1.05, 1.09
      UNION ALL SELECT 1, '103', 1.09, 1.13
      UNION ALL SELECT 1, '104', 1.2, 1.25
      UNION ALL SELECT 1, '105', 1.33, 1.39
      UNION ALL SELECT 2, '102', 1.05, 1.09
      UNION ALL SELECT 2, '103', 1.09, 1.13
      UNION ALL SELECT 2, '104', 1.2, 1.25
      UNION ALL SELECT 2, '105', 1.33, 1.39)
SELECT t0.big_id, row_number() OVER (PARTITION BY t0.big_id) group_id, ARRAY_AGG(t1.small_id) my_arrays FROM t0
CROSS JOIN t0 t1

WHERE t0.big_id = t1.big_id AND t1.top/t0.bottom BETWEEN 1 AND 1.15
GROUP BY t0.big_id, t0.small_id

I have a table with ids and top and bottom of confidence intervals. I want to compare all unique small_id pairs beginning from small_id with lower bottom. Unique pair means: do not need to compare 102 and 101 if 101 and 102 compared already. Then I need to group small_ids with similar confidence intervals into arrays. Then I need to reduce group if all ids matched in some bigger group in same partitioned window. small_id is not numeric. Just text string. So not possible directly compare small_id as numbers using <>.

These rows what I need to reduce because I got values in another arrays

enter image description here

How I need to modify my query to get expected output?

2
  • all those crosses and arrows are really not helping - please clarify what is your expected result in plain view and also what is the logic you are trying to implement in your query Commented Oct 20, 2020 at 18:26
  • Thank you for reply! I added more details of expected result Commented Oct 20, 2020 at 19:17

1 Answer 1

1

Below is for BigQuery Standard SQL

#standardsql
with t0 as (SELECT 1 as big_id, '101' as small_id,  0.99 as bottom, 1.03 top
  UNION ALL SELECT 1, '102', 1.05, 1.09
  UNION ALL SELECT 1, '103', 1.09, 1.13
  UNION ALL SELECT 1, '104', 1.2, 1.25
  UNION ALL SELECT 1, '105', 1.33, 1.39
  UNION ALL SELECT 2, '102', 1.05, 1.09
  UNION ALL SELECT 2, '103', 1.09, 1.13
  UNION ALL SELECT 2, '104', 1.2, 1.25
  UNION ALL SELECT 2, '105', 1.33, 1.39
), temp as (      
  SELECT t0.big_id, 
    row_number() OVER (PARTITION BY t0.big_id) group_id, 
    ARRAY_AGG(t1.small_id) my_arrays FROM t0
  CROSS JOIN t0 t1
  WHERE t0.big_id = t1.big_id AND t1.top/t0.bottom BETWEEN 1 AND 1.15
  GROUP BY t0.big_id, t0.small_id
)
select big_id, group_id, any_value(my_arrays) my_arrays 
from (
  select t1.*,
    ( select count(1)
      from t1.my_arrays id
      join t2.my_arrays id
      using(id)
      where t1.group_id != t2.group_id
    ) = array_length(t1.my_arrays) as flag
  from temp t1 
  join temp t2
  using (big_id)
)
group by big_id, group_id
having countif(flag) = 0     

with output

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

glad it worked for you. initially I thought it it purely recursive based logic that can be addressed by either scripting or js udf. but after some sleep over time and realized it can be done in set based way - so, here we go :o)
Thank you. It's looks much better than js. What we are loosing using js udf?
usually you would wanted to translate your logic into set-based way so you can use pure SQL that is the most effective! If this is not possible that when you would look for scripting or js udf which have some limitations. so to answer "what we are loosing" i would say - effectiveness of set-based operations

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.