BigQuery array intersection across all rows

Question

I have the following data:

col1
`[1,2,3]`
`[1,2]`
`[2,3]`

I want the following output

ans
`[2]`

In other words, I want the intersection of arrays in all rows. Any pointers?

Mikhail Berlyant · Accepted Answer · 2022-09-29 16:49:35Z

2

Consider below approach

select array_agg(item) ans from (
  select distinct item
  from your_table t, t.col1 item
  qualify count(distinct format('%t', col1)) over(partition by item) 
    = count(distinct format('%t', col1)) over()
)

answered Sep 29, 2022 at 16:49

Mikhail Berlyant

174k10 gold badges173 silver badges251 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ricco D · Accepted Answer · 2022-09-30 02:32:40Z

Another approach could be seen below. This is assuming that your arrays does not have duplicate values.

cte groups by array value, and gets all the indices that contain this value and counts them. It also assigns row_number for each row. This can now be filtered where the maximum row_number = count of indices that contain the value.

with sample_data as (
    select [1,2,3] as col1
    union all select [1,2] as col1
    union all select [2,3] as col1
),
cte as (
select 
  col1[safe_offset(index)] as arr_value,
  array_length(array_agg(index)) as exist_in_n_rows, -- get indices where values appeared then count them
  row_number() over () as rn
from sample_data,
  unnest(generate_array(0,array_length(col1)-1)) as index
group by 1
)
select 
array_agg(intersection) as ans
from (
  select 
    arr_value as intersection,
  from cte
    where true
    qualify max(rn) over () = exist_in_n_rows
)

Output:

NOTE: If you have duplicate values in the array, it is much better to de duplicate them first prior to running the query above.

Samuel · Accepted Answer · 2022-09-29 16:58:21Z

Intersection of array of all rows. There is the need to unnest the array and group by each element and count how often it is in a row present.

Lets start with generating your datset in table tbl.

Then in a helper table, we generate a row_number, starting with one for each row. We also count the maximum numbers of rows.

In the last step, we unnest the col1 column with all the array elements and call them y. We group by y and count for each unique entry of row_num in the new column ref. having does a comparing of the ref to the total amount of rows in the beginning and filters only the element which are present in all arrays.

With tbl as (select a.x as col1 from unnest( [struct([1,2,3,3] as x),struct([1,2]),struct([2,3])] )as a),
helper as ( select col1,row_number() over () as row_num,count(col1) over () as counts  from tbl)
select y,count(distinct row_num) as ref, any_value(counts) ref_target  
from helper , unnest(col1) y 
group by 1
having ref=ref_target

Collectives™ on Stack Overflow

BigQuery array intersection across all rows

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related