5

I have a wide flat table, stored in Google bigquery in the folowing similar format :

log_date:integer,sessionid:integer,computer:string,ip:string,event_id:integer,amount:float

I'm trying to create this table in hierarchical nested format , having 2 nested levels , as following :

 [
  {
    "name": "log_date",
    "type": "integer"
  }, 
  {
    "name": "session",
    "type": "record",
    "mode": "repeated",
    "fields": [                 
     {
       "name": "sessionid",
       "type": "integer"
         },
     {
       "name": "computer",
       "type": "string"
        },
        {
       "name": "ip",
       "type": "string"
        },
        {
    "name": "event",
    "type": "record",
    "mode": "repeated",
    "fields": [
    {
       "name": "event_id",
       "type": "integer"
     },
     {
       "name": "amount",
       "type": "float"
     }]] } ]

What is the best way to generate the json formatted data file from bigquery table ? Is there a different and faster approach than 1. download the table into external csv 2. build the json record , and write it into external file 3. upload the external json file into new bigquery table

Can we have a direct process that generates json from existing tables ?

Thank you , H

0

2 Answers 2

1

There isn't currently a way to automatically transform the data to a nested format. If you'd like to get the data out in json format rather than CSV, you can use the export commend with the --destination_format flag set to NEWLINE_DELIMITED_JSON. e.g.

bq extract \
    --destination_format=NEWLINE_DELIMITED_JSON \
    yourdataset.table \
    gs://your_bucket/result*.json 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks , But i guess that extracting in JSON format will not support the nested record that i'm trying to build , any other solutions regarding how to generate hierarchical table , based on flat table in bq ?
0

This can be accomplished with array_agg in standard SQL.

Note that if you want to nest in layers there need to be common table expressions as an array_agg can not directly contain another array_agg.

WITH DATA AS (
 SELECT 1 AS log_date, 10 AS sessionid, 'a' AS computer, '1.2.3.4' AS ip, 100 AS event_id, 1 AS amount
 UNION ALL SELECT 1 AS log_date, 11 AS sessionid, 'b' AS computer, '1.2.3.5' AS ip, 101 AS event_id, 2 AS amount
 UNION ALL SELECT 1 AS log_date, 11 AS sessionid, 'b' AS computer, '1.2.3.5' AS ip, 102 AS event_id, 3 AS amount
 UNION ALL SELECT 2 AS log_date, 20 AS sessionid, 'a' AS computer, '1.2.3.4' AS ip, 200 AS event_id, 4 AS amount
 UNION ALL SELECT 2 AS log_date, 20 AS sessionid, 'a' AS computer, '1.2.3.4' AS ip, 201 AS event_id, 5 AS amount
 UNION ALL SELECT 2 AS log_date, 21 AS sessionid, 'c' AS computer, '1.2.3.6' AS ip, 202 AS event_id, 6 AS amount ),
inner_Aggregate AS (
  SELECT
    log_date,
    sessionid,
    computer,
    ip,
    ARRAY_AGG(STRUCT(event_id, amount)) AS event
  FROM
    DATA
  GROUP BY
    log_date,
    sessionid,
    computer,
    ip )
SELECT
  log_date,
  ARRAY_AGG(STRUCT(sessionid, computer, ip, event )) AS session
FROM
  inner_Aggregate
GROUP BY
  log_date

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.