1

How to unnest a variant(array) column in Snowflake into multiple columns

column name: user; table name: event; the column name is a variant format


    [
      {
        "key": "user_id",
        "value": {
          "set_timestamp_micros": 1621804433449213,
          "string_value": "auth0|6094a88b602505006f20fc0e"
        }
      },
      {
        "key": "env",
        "value": {
          "set_timestamp_micros": 1621804433445213,
          "string_value": "staging"
        }
      },
      {
        "key": "first_open_time",
        "value": {
          "int_value": 1620248400000,
          "set_timestamp_micros": 1620245124142213
        }
      }
    ]

My objectives are to transpose like

user_id env
auth0|6094a88b602505006f20fc0e staging

I tried FLATTEN function, but it is not working as I expected.

1
  • Snowflake is not BigQuery so I fixed the tags. Commented May 24, 2021 at 0:43

2 Answers 2

1

So FLATTEN on your JSON would give you access to the three sub objects of the array, but you are wanting to access two sub objects by name, if you have sets of there values/objects in your data, and they are all related via set_timestamp_micros, you could PIVOT after FLATTEN or you could MAX like

SELECT f.value:value:set_timestamp_micros::number as set_timestamp_micros
    ,max(iff(f.value:key = 'env', f.value:value:string_value::text, null)) as env
    ,max(iff(f.value:key = 'user_id', f.value:value:string_value::text, null)) as user_id 
    ,max(iff(f.value:key = 'first_open_time', f.value:value:int_value::number, null)) as first_open_time 
FROM data_table AS dt, 
 TABLE(FALTTEN(input=> dt.json)) f
GROUP BY set_timestamp_micros
ORDER BY set_timestamp_micros;
Sign up to request clarification or add additional context in comments.

Comments

1

Flatten just gives you access to the elements of the array. Since the form of the JSON is key-value as separate attributes, you'll need to pivot after you flatten:

WITH x AS (
    SELECT parse_json('    [
      {
        "key": "user_id",
        "value": {
          "set_timestamp_micros": 1621804433449213,
          "string_value": "auth0|6094a88b602505006f20fc0e"
        }
      },
      {
        "key": "env",
        "value": {
          "set_timestamp_micros": 1621804433445213,
          "string_value": "staging"
        }
      },
      {
        "key": "first_open_time",
        "value": {
          "int_value": 1620248400000,
          "set_timestamp_micros": 1620245124142213
        }
      }
    ]') as var
  ), z AS (
   SELECT y.value:key::string as key, y.value:value:string_value::string as value
   from x,
   lateral flatten(input=>var) y
  )
SELECT "'user_id'" as user_id, "'env'" as env
FROM z
PIVOT (MAX(value) FOR key IN ('user_id','env')) AS TMP;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.