0

I asked a similar question here that I thought abstracted my problem sufficiently but unfortunately, it did not.

I have a table of nested arrays, the first column is an int. I can join two arrays without duplication (as answered in my previous question) but I'm unsure how to do it with more than two.

Here is the table (in StandardSQL):

WITH
  a AS (
  SELECT 
    1 AS col1,
    ARRAY[1, 2 ] AS col2,
    ARRAY[1, 2, 3] AS col3,
    ARRAY[1, 2, 3, 4] AS col4
  UNION ALL
  SELECT
    2 AS col1, 
    ARRAY[1, 2, 2] AS col2,
    ARRAY[1, 2, 3] AS col3,
    ARRAY[1, 2, 3, 4] AS col4
  UNION ALL
  SELECT
    3 AS col1,
    ARRAY[2, 2 ] AS col2,
    ARRAY[1, 2, 3] AS col3,
    ARRAY[1, 2, 3, 4] AS col4
    )
SELECT
  *
FROM
  a

Produces:

+-------++--------++--------++---------+
| col1   |   col2  |   col3  |   col4  |
+-------++--------++--------++---------+
|   1    |   1     |   1     |   1     |
|        |   2     |   2     |   2     |
|        |         |   3     |   3     |
|        |         |         |   4     |
|   2    |   1     |   1     |   1     |
|        |   2     |   2     |   2     |
|        |         |   3     |   3     |
|        |         |         |   4     |
|   3    |   1     |   1     |   1     |
|        |   2     |   2     |   2     |
|        |         |   3     |   3     |
|        |         |         |   4     |
+-------++--------++--------++---------+

But what I'm looking for is this:

+-------++--------++--------++---------+
| col1   |   col2  |   col3  |   col4  |
+-------++--------++--------++---------+
|   1    |   1     |   1     |   1     |
|  null  |   2     |   2     |   2     |
|  null  |  null   |   3     |   3     |
|  null  |  null   |  null   |   4     |
|   2    |   1     |   1     |   1     |
|  null  |   2     |   2     |   2     |
|  null  |  null   |   3     |   3     |
|  null  |  null   |  null   |   4     |
|   3    |   1     |   1     |   1     |
|  null  |   2     |   2     |   2     |
|  null  |  null   |   3     |   3     |
|  null  |  null   |  null   |   4     |
+-------++--------++--------++---------+

Here is how I'm unnesting the many columns:

SELECT
  col1,
  _col2,
  _col3
FROM
  a left join 
  unnest(col2) as _col2 
  left join unnest(col3) as _col3

Producing this table:

+-------++--------++--------+
| col1   |   col2  |   col3 |
+-------++--------++--------+
|   1    |   1     |   1    |
|   1    |   1     |   2    |
|   1    |   1     |   3    |
|   1    |   2     |   1    |
|   1    |   2     |   2    |
|   1    |   2     |   3    |
|   2    |   1     |   1    |
|   2    |   1     |   2    |
|   2    |   1     |   3    |
|   2    |   2     |   1    |
|   2    |   2     |   2    |
|   2    |   2     |   3    |
...
...
...
+-------++--------++--------++

1 Answer 1

1

I don't fully understand how your results relate to the input data. The results for all the col1 values are exactly the same, but the inputs are different.

That said, I can interpret this as an extension of your previous question. This may be what you want:

SELECT a.col1, c2, c3, c4
FROM (select a.*,
             (SELECT ARRAY_AGG(DISTINCT c) cs
              from unnest(array_concat( col2, col3, col4)) c
             ) cs
      from a 
     ) a cross join
     unnest(cs) c left join      
     unnest(a.col2) c2
     on c2 = c left join
     unnest(a.col3) c3
     on c3 = c left join
     unnest(a.col4) c4
     on c4 = c;

The initial subquery for a generates all the values in the arrays. This is then used for a left join.

Sign up to request clarification or add additional context in comments.

5 Comments

Ahhhhh, I wasn't understanding your implementation from the other question. You join each nest on c. Damn, thank you!
Would you take the same approach if the data was a little more inconsistent? For instance: WITH a AS ( SELECT 1 AS col1, ARRAY[1, 1,1,1,1,2 ] AS col2, ARRAY[1, 2, 3] AS col3, ARRAY[1, 2, 3, 4] AS col4 UNION ALL SELECT 2 AS col1, ARRAY[ ] AS col2, ARRAY[1, 2,3,3, 3] AS col3, ARRAY[1, 3, 4] AS col4 UNION ALL SELECT 3 AS col1, ARRAY[2, 2 ] AS col2, ARRAY[1, 3] AS col3, ARRAY[ 3, 4] AS col4 )
@ethanenglish . . . I would ask if it works. This approach essentially does a full join, but that is not allowed with unnest().
It does partially. There are duplicate rows and I'm unclear why. I'm still trying to unpack your strategy so I can use it to solve new problems but it's taking me a min.
@ethanenglish . . . You have duplicates in the arrays, so I would expect duplicates in the results. You might want to ask another question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.