2

I have a table column with nested arrays in a Snowflake database. I want to convert the nested array into columns in the manner shown below in Snowflake SQL.

Table Name: SENSOR_DATA

The RX column is of data type VARIANT. The nested arrays will not always be 3 as shown below. There are cases where there are 20,000 nested arrays, and other cases where there are none.

| ID |             RX              |
|----|-----------------------------|
| 1  |[[0, 15], [1, 50], [2, 34.2]]|
| 2  |[[0, 20], [1, 25]]           |
| 3  |[[0, 100], [1, 42], [2, 76]] |

I want to achieve something like this from the table above:

| ID |Col0 | Col1| Col2|
|----|-----|-----|-----|
| 1  |  15 |  50 | 34.2|
| 2  |  20 |  25 | NULL|
| 3  | 100 |  42 | 76  |
1
  • What will you do with 60,000(!) columns? Commented Jun 19, 2022 at 14:40

3 Answers 3

1

Lets recreate that table:

create table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]');

Then get the distinct column keys that you have:

select distinct r.value[0] from sensor_data, table(flatten(input=>rx)) r;
R.VALUE[0]
0
1
2

Given we want this to be dynamic, the next step is just to check with static SQL it gives us the correct answer:

 select id
    ,max(iff(r.value[0] = 0, r.value[1], null)) as col_0
    ,max(iff(r.value[0] = 1, r.value[1], null)) as col_1
    ,max(iff(r.value[0] = 2, r.value[1], null)) as col_2
 from sensor_data, table(flatten(input=>rx)) r
 group by 1
 order by 1;
ID COL_0 COL_1 COL_2
1 15 50 34.2
2 20 25 null
3 100 42 76

write now to do this dynamically:

declare
  sql string;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  return sql;
end;

which gives us:

select id ,max(iff(r.value[0] = 0, r.value1, null)) as col_0,max(iff(r.value[0] = 1, r.value1, null)) as col_1,max(iff(r.value[0] = 2, r.value1, null)) as col_2 from sensor_data, table(flatten(input=>rx)) r group by 1 order by 1

which when run gives us the expected results.

so now we want to run that:

declare
  sql string;
  res resultset;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  
  res := (execute immediate :sql);
  return table (res);
end;

gives:

ID COL_0 COL_1 COL_2
1 15 50 34.2
2 20 25 null
3 100 42 76

based of code from these sections of the manual:

Working with loops

Working with Resultsets

With extra "tricky data" and the mentioned ORDER BY:

create or replace table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]')
             ,(4, '[[0,20],[30,50], [45, 100]]');
declare
  sql string;
  res resultset;
  c1 cursor for select distinct r.value[0] as key from sensor_data, table(flatten(input=>rx)) r order by key;
begin
  sql := 'select id ';
  for record in c1 do
    sql := sql || ',max(iff(r.value[0] = '|| record.key::text ||', r.value[1], null)) as col_' || record.key::text;
  end for;
  sql := sql || ' from sensor_data, table(flatten(input=>rx)) r  group by 1  order by 1';
  
  res := (execute immediate :sql);
  return table (res);
end;

gives:

ID COL_0 COL_1 COL_2 COL_30 COL_45
1 15 50 34.2 null null
2 20 25 null null null
3 100 42 76 null null
4 20 null null 50 100

So yes, but that's the MAX doing:

so if we go back in to original data

create or replace table sensor_data as 
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]');

and alter the code to not have the MAX and the GROUP BY

 select id
    ,iff(r.value[0] = 0, r.value[1], null) as col_0
    ,iff(r.value[0] = 1, r.value[1], null) as col_1
    ,iff(r.value[0] = 2, r.value[1], null) as col_2
 from sensor_data, table(flatten(input=>rx)) r
 order by 1;  

we see:

ID COL_0 COL_1 COL_2
1 15 null null
1 null 50 null
1 null null 34.2
2 20 null null
2 null 25 null
3 100 null null
3 null 42 null
3 null null 76

so we can see those values getting unrolled, now the GROUP BY id, is going to roll those three id's up, thus for col0 we have 15, null, null MAX takes the higest value, and 15 is higher that null so that is what is keep. This is the process this is working on.

Sign up to request clarification or add additional context in comments.

11 Comments

The cursor query should have a order by if you need those columns to be in order.
What if the indices were not always in that order. What if it was [[0,20],[30,50], [45, 100]]? How can I have it where it is col_0, col_30, col_45? This will not always be the same.
The above code solve this exact case. did you try to "tricky" data?
Yes, thank you so much! Would you be able to give a brief explanation of the 'With extra "tricky data" and the mentioned ORDER BY' part?
The answer in the extra section was just the answer from the prior section. I only changed it by an ORDER BY to the cursor query, like I mentioned in my comment. So the output columns are presented in the nicer order. The whole point of that section was to show you that this code works, for any random combination of input, which is what you original asked for. I am not sure which part to do not get, thus what you would like to have an explanation for?
|
1

Using [] to access array elements:

SELECT ID, RX[0][1] AS col1,  RX[1][1] AS col1, RX[2][1] AS col2
FROM SENSOR_DATA;

1 Comment

My nested array size is not fixed. I could have 60,000 nested arrays, so writing it out the way you suggested is not possible.
0

Not exactly what you asked for but this is close

with sensor_data as (
  select column1 id, parse_json(column2) rx
  from values (1, '[[0, 15], [1, 50], [2, 34.2]]')
             ,(2, '[[0, 20], [1, 25]]')
             ,(3, '[[0, 100], [1, 42], [2, 76]]')
       as vals
),
flat as (
select id, val.value[1] arrvalue
  from sensor_data,
  lateral flatten(input => sensor_data.rx, outer => true) val
 )
select
     id
    ,listagg(arrvalue, ',') rx_list
from flat
group by id
order by id
  ;
ID  RX_LIST
1   15,50,34.2
2   20,25
3   100,42,76

2 Comments

that's not quite what I need
Understood, but not really feasible to dynamically generate upwards of 20,000 columns. Either a list or an array would be more manageable. If you truly must have thousands of columns, you will need a procedural approach to dynamically generate the result set.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.