2

I need to do the following with 2 array fields in the table below. The arrays are of type Struct<String, String>.

  1. Merge the arrays together
  2. If there is a duplicate key between the labels.key and project.key, then I only want to keep the kvp from the labels field
  3. flatten the combined array into a delimited string an order them (so I can group by)

Example Table Data

SELECT 1 as id, ARRAY
  [STRUCT("testlabel2" as key, "thisvalueisbetter" as value), STRUCT("testlabel3", "testvalue3")] as labels, 
  [STRUCT("testlabel2" as key, "testvalue2" as value)] as project

The below query does everything except #2 and I'm not sure how to accomplish that. Does anyone have a suggestion on how to do this?

SELECT
  id,
  (SELECT STRING_AGG(DISTINCT CONCAT(l.key, ':', l.value) ORDER BY CONCAT(l.key, ':', l.value))
    FROM UNNEST(
    ARRAY_CONCAT(labels, project)) AS l) AS label,
FROM `mytestdata` AS t
GROUP BY id, label

Currently this query gives the output:

1 testlabel2:testvalue2,testlabel2:thisvalueisbetter,testlabel3:testvalue3

But I'm looking for:

1 testlabel2:thisvalueisbetter,testlabel3:testvalue3
2
  • Are your structs arrays? Can you provide vetter examples of data to reproduce the problem? Commented Jan 14, 2020 at 18:02
  • Sorry I realize now that my example data is not very clear. I've updated the post to include a query that we create an example of the data. There are 2 arrays (labels and project) and both are of type Struct<String, String>. I'll work at putting it together to make the example data with the query... sorry my knowledge of BigQuery is limited so it might take me a bit. Commented Jan 14, 2020 at 18:37

1 Answer 1

2

Below is for BigQuery Standard SQL

#standardSQL
SELECT *, 
  ARRAY(
    SELECT AS STRUCT key, ARRAY_AGG(value ORDER BY source LIMIT 1)[OFFSET(0)] AS value
    FROM ( 
      SELECT 0 AS source, * FROM t.labels UNION ALL
      SELECT 1, * FROM t.project 
    ) 
    GROUP BY key
  ) AS combined_array
FROM `project.dataset.table` t  

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
SELECT ARRAY
  [STRUCT("testlabel2" AS key, "thisvalueisbetter" AS value), STRUCT("testlabel3", "testvalue3")] AS labels, 
  [STRUCT("testlabel2" AS key, "testvalue2" AS value)] AS project
)
SELECT *, 
  ARRAY(
    SELECT AS STRUCT key, ARRAY_AGG(value ORDER BY source LIMIT 1)[OFFSET(0)] AS value
    FROM ( 
      SELECT 0 AS source, * FROM t.labels UNION ALL
      SELECT 1, * FROM t.project 
    ) 
    GROUP BY key
  ) AS combined_array
FROM `project.dataset.table` t  

with result

enter image description here

Or ... to fully match your expected output - use below

#standardSQL
SELECT *, 
  (SELECT STRING_AGG(x) FROM (
    SELECT CONCAT(key, ':', ARRAY_AGG(value ORDER BY source LIMIT 1)[OFFSET(0)]) x
    FROM ( 
      SELECT 0 AS source, * FROM t.labels UNION ALL
      SELECT 1, * FROM t.project 
    ) 
    GROUP BY key
  )) AS combined_result
FROM `project.dataset.table` t   

with result

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Awesome. Thank you very much! I think it'll take me a bit to understand how this is working, but that's what I'm looking for :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.