0

I have the following data:

col1
[1,2,3]
[1,2]
[2,3]

I want the following output

ans
[2]

In other words, I want the intersection of arrays in all rows. Any pointers?

3 Answers 3

2

Consider below approach

select array_agg(item) ans from (
  select distinct item
  from your_table t, t.col1 item
  qualify count(distinct format('%t', col1)) over(partition by item) 
    = count(distinct format('%t', col1)) over()
)
Sign up to request clarification or add additional context in comments.

Comments

2

Another approach could be seen below. This is assuming that your arrays does not have duplicate values.

cte groups by array value, and gets all the indices that contain this value and counts them. It also assigns row_number for each row. This can now be filtered where the maximum row_number = count of indices that contain the value.

with sample_data as (
    select [1,2,3] as col1
    union all select [1,2] as col1
    union all select [2,3] as col1
),
cte as (
select 
  col1[safe_offset(index)] as arr_value,
  array_length(array_agg(index)) as exist_in_n_rows, -- get indices where values appeared then count them
  row_number() over () as rn
from sample_data,
  unnest(generate_array(0,array_length(col1)-1)) as index
group by 1
)
select 
array_agg(intersection) as ans
from (
  select 
    arr_value as intersection,
  from cte
    where true
    qualify max(rn) over () = exist_in_n_rows
)

Output:

enter image description here

NOTE: If you have duplicate values in the array, it is much better to de duplicate them first prior to running the query above.

Comments

1

Intersection of array of all rows. There is the need to unnest the array and group by each element and count how often it is in a row present.

Lets start with generating your datset in table tbl.

Then in a helper table, we generate a row_number, starting with one for each row. We also count the maximum numbers of rows.

In the last step, we unnest the col1 column with all the array elements and call them y. We group by y and count for each unique entry of row_num in the new column ref. having does a comparing of the ref to the total amount of rows in the beginning and filters only the element which are present in all arrays.

With tbl as (select a.x as col1 from unnest( [struct([1,2,3,3] as x),struct([1,2]),struct([2,3])] )as a),
helper as ( select col1,row_number() over () as row_num,count(col1) over () as counts  from tbl)
select y,count(distinct row_num) as ref, any_value(counts) ref_target  
from helper , unnest(col1) y 
group by 1
having ref=ref_target

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.