2

I would like to query BigQuery where two columns are returned, a distinct object ID column, and a distinct combination of array values based on values of the arrays where the object IDs were the same.

For example, say I have this table in BigQuery:

+-----------+-----------+
| object_id |  array    |
+-----------+-----------+
| a         | 1,2,3,4,5 |
| a         | 2,3,4,5,6 |
| b         | 9,8,7,6   |
+-----------+-----------+

I would like to run a query that returns the following data:

+-----------+-------------+
| object_id |  array      |
+-----------+-------------+
| a         | 1,2,3,4,5,6 |
| b         | 9,8,7,6     |
+-----------+-------------+

I think the query should be something similar to this but am a little unclear on how to get the distinct object ID with the distinct combination of array elements.

2 Answers 2

2

Consider below [slim] solution

select id, array_agg(distinct el) as arr
from `project.dataset.table`,
unnest(arr) el
group by id  
Sign up to request clarification or add additional context in comments.

Comments

0

Use array_concat_agg:

WITH `project.dataset.table` AS (
  SELECT 'a' id, [1,2,3,4,5] arr UNION ALL
  SELECT 'a', [2,3,4,5,6] UNION ALL
  SELECT 'b', [9,8,7,6]
)
SELECT
    id,
    ARRAY(SELECT DISTINCT x FROM UNNEST(concatenated) as x) as a
FROM (
  SELECT
    id,
    ARRAY_CONCAT_AGG(arr) as concatenated
  FROM `project.dataset.table`
  GROUP BY id
)

enter image description here

1 Comment

Works perfectly! Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.