Convert Nested Array into Columns in Snowflake

Question

I have a table column with nested arrays in a Snowflake database. I want to convert the nested array into columns in the manner shown below in Snowflake SQL.

Table Name: SENSOR_DATA

The RX column is of data type VARIANT. The nested arrays will not always be 3 as shown below. There are cases where there are 20,000 nested arrays, and other cases where there are none.

| ID |             RX              |
|----|-----------------------------|
| 1  |[[0, 15], [1, 50], [2, 34.2]]|
| 2  |[[0, 20], [1, 25]]           |
| 3  |[[0, 100], [1, 42], [2, 76]] |

I want to achieve something like this from the table above:

| ID |Col0 | Col1| Col2|
|----|-----|-----|-----|
| 1  |  15 |  50 | 34.2|
| 2  |  20 |  25 | NULL|
| 3  | 100 |  42 | 76  |

What will you do with 60,000(!) columns?

David דודו Markovitz
– David דודו Markovitz

2022-06-19 14:40:42 +00:00
Commented Jun 19, 2022 at 14:40 — David דודו Markovitz
– David דודו Markovitz, Commented Jun 19, 2022 at 14:40

Simeon Pilgrim · Accepted Answer · 2022-06-21 02:34:00Z

1

Lets recreate that table:

create table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]');

Then get the distinct column keys that you have:

select distinct r.value[0] from sensor_data, table(flatten(input=>rx)) r;

R.VALUE[0]
0
1
2

Given we want this to be dynamic, the next step is just to check with static SQL it gives us the correct answer:

 select id
    ,max(iff(r.value[0] = 0, r.value[1], null)) as col_0
    ,max(iff(r.value[0] = 1, r.value[1], null)) as col_1
    ,max(iff(r.value[0] = 2, r.value[1], null)) as col_2
 from sensor_data, table(flatten(input=>rx)) r
 group by 1
 order by 1;

ID	COL_0	COL_1	COL_2
1	15	50	34.2
2	20	25	null
3	100	42	76

write now to do this dynamically:

declare
  sql string;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  return sql;
end;

which gives us:

select id ,max(iff(r.value[0] = 0, r.value1, null)) as col_0,max(iff(r.value[0] = 1, r.value1, null)) as col_1,max(iff(r.value[0] = 2, r.value1, null)) as col_2 from sensor_data, table(flatten(input=>rx)) r group by 1 order by 1

which when run gives us the expected results.

so now we want to run that:

declare
  sql string;
  res resultset;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  
  res := (execute immediate :sql);
  return table (res);
end;

gives:

ID	COL_0	COL_1	COL_2
1	15	50	34.2
2	20	25	null
3	100	42	76

based of code from these sections of the manual:

Working with loops

Working with Resultsets

With extra "tricky data" and the mentioned ORDER BY:

create or replace table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]')
             ,(4, '[[0,20],[30,50], [45, 100]]');

declare
  sql string;
  res resultset;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r order by key;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  
  res := (execute immediate :sql);
  return table (res);
end;

gives:

ID	COL_0	COL_1	COL_2	COL_30	COL_45
1	15	50	34.2	null	null
2	20	25	null	null	null
3	100	42	76	null	null
4	20	null	null	50	100

So yes, but that's the MAX doing:

so if we go back in to original data

create or replace table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]');

and alter the code to not have the MAX and the GROUP BY

 select id
    ,iff(r.value[0] = 0, r.value[1], null) as col_0
    ,iff(r.value[0] = 1, r.value[1], null) as col_1
    ,iff(r.value[0] = 2, r.value[1], null) as col_2
 from sensor_data, table(flatten(input=>rx)) r
 order by 1;

we see:

ID	COL_0	COL_1	COL_2
1	15	null	null
1	null	50	null
1	null	null	34.2
2	20	null	null
2	null	25	null
3	100	null	null
3	null	42	null
3	null	null	76

so we can see those values getting unrolled, now the GROUP BY id, is going to roll those three id's up, thus for col0 we have 15, null, null MAX takes the higest value, and 15 is higher that null so that is what is keep. This is the process this is working on.

edited Jun 21, 2022 at 2:34

answered Jun 17, 2022 at 8:38

Simeon Pilgrim

26.7k3 gold badges38 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Simeon Pilgrim Over a year ago

The cursor query should have a order by if you need those columns to be in order.

Rookie-XB Over a year ago

What if the indices were not always in that order. What if it was [[0,20],[30,50], [45, 100]]? How can I have it where it is col_0, col_30, col_45? This will not always be the same.

Simeon Pilgrim Over a year ago

The above code solve this exact case. did you try to "tricky" data?

Rookie-XB Over a year ago

Yes, thank you so much! Would you be able to give a brief explanation of the 'With extra "tricky data" and the mentioned ORDER BY' part?

Simeon Pilgrim Over a year ago

The answer in the extra section was just the answer from the prior section. I only changed it by an ORDER BY to the cursor query, like I mentioned in my comment. So the output columns are presented in the nicer order. The whole point of that section was to show you that this code works, for any random combination of input, which is what you original asked for. I am not sure which part to do not get, thus what you would like to have an explanation for?

|

Lukasz Szozda · Accepted Answer · 2022-06-16 19:50:08Z

1

Using [] to access array elements:

SELECT ID, RX[0][1] AS col1,  RX[1][1] AS col1, RX[2][1] AS col2
FROM SENSOR_DATA;

answered Jun 16, 2022 at 19:50

Lukasz Szozda

181k26 gold badges278 silver badges326 bronze badges

1 Comment

Rookie-XB Over a year ago

My nested array size is not fixed. I could have 60,000 nested arrays, so writing it out the way you suggested is not possible.

Dave Welden · Accepted Answer · 2022-06-16 21:58:19Z

0

Not exactly what you asked for but this is close

with sensor_data as (
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]')
       as vals
),
flat as (
select id, val.value[1] arrvalue
  from sensor_data,
  lateral flatten(input => sensor_data.rx, outer => true) val
 )
select
     id
    ,listagg(arrvalue, ',') rx_list
from flat
group by id
order by id
  ;

ID  RX_LIST
1   15,50,34.2
2   20,25
3   100,42,76

answered Jun 16, 2022 at 21:58

Dave Welden

1,9791 gold badge10 silver badges10 bronze badges

2 Comments

Rookie-XB Over a year ago

that's not quite what I need

Dave Welden Over a year ago

Understood, but not really feasible to dynamically generate upwards of 20,000 columns. Either a list or an array would be more manageable. If you truly must have thousands of columns, you will need a procedural approach to dynamically generate the result set.

Collectives™ on Stack Overflow

Convert Nested Array into Columns in Snowflake

3 Answers 3

With extra "tricky data" and the mentioned ORDER BY:

So yes, but that's the MAX doing:

11 Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

With extra "tricky data" and the mentioned ORDER BY:

So yes, but that's the MAX doing:

11 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related