21

i'm trying to extract two key from every json in an arry of jsons(using sql legacy) currently i am using json extract function :

json_extract(json_column , '$[1].X') AS X,
json_extract(json_column , '$[1].Y') AS Y,

how can i make it run on every json at the 'json arry column', and not just [1] (for example)?

An example json:

[

{"blabla":000,"X":1,"blabla":000,"blabla":000,"blabla":000,,"Y":"2"},

{"blabla":000,"X":3,"blabla":000,"blabla":000,"blabla":000,,"Y":"4"},

]   

thanks in advance!

1
  • See the relevant feature request on the issue tracker. Commented Aug 31, 2018 at 17:43

2 Answers 2

47

Update 2020: JSON_EXTRACT_ARRAY()

Now BigQuery supports JSON_EXTRACT_ARRAY():

For example, to solve this particular question:

SELECT id
  , ARRAY(
      SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') 
      FROM UNNEST(JSON_EXTRACT_ARRAY(payload, "$.commits"))x
  ) emails
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

enter image description here


Previous answer

Let's start with a similar problem - this is not a very convenient way to extract all emails from a json array:

SELECT id
  , [ JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[0].author.email')  
      , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[1].author.email')  
      , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[2].author.email')  
      , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[3].author.email')
    ] emails
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

enter image description here

The best way we have right now to deal with this is to use some JavaScript in an UDF to split a json-array into a SQL array:

CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(json).map(x=>JSON.stringify(x));
"""; 

SELECT * EXCEPT(array_commits),
  ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') FROM UNNEST(array_commits) x) emails
FROM (
  SELECT id
    , json2array(JSON_EXTRACT(payload, '$.commits')) array_commits
  FROM `githubarchive.day.20180830` 
  WHERE type='PushEvent' 
  AND id='8188163772'
)

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

is there a better newer answer to this question by any chance?
15

May 1st, 2020 Update

A new function, JSON_EXTRACT_ARRAY, has been just added to the list of JSON functions. This function allows you to extract the contents of a JSON document as a string array.

so in below you can replace use of CUSTOM_JSON_EXTRACT UDF with just in-built function JSON_EXTRACT_ARRAY as in below example

#standardSQL
SELECT 
  JSON_EXTRACT_SCALAR(json , '$.X') AS X,
  JSON_EXTRACT_SCALAR(json , '$.Y') AS Y
FROM t, UNNEST(JSON_EXTRACT_ARRAY(json_column , '$')) json   

==============

Below example for BigQuery Standard SQL and allows you to be close to standard way of working with JSONPath and no extra manipulation needed so you just simply use CUSTOM_JSON_EXTRACT(json, json_path) function

#standardSQL
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
        return jsonPath(JSON.parse(json), json_path);
"""
OPTIONS (
    library="gs://your_bucket/jsonpath-0.8.0.js"
);
WITH t AS (
SELECT '''
[
{"blabla1":1,"X":1,"blabla2":3,"blabla3":5,"blabla4":7,"Y":"2"},
{"blabla1":2,"X":3,"blabla2":4,"blabla3":6,"blabla4":8,"Y":"4"}
]   
''' AS json_column 
)
SELECT 
  CUSTOM_JSON_EXTRACT(json_column , '$[*].X') AS X,
  CUSTOM_JSON_EXTRACT(json_column , '$[*].Y') AS Y
FROM t   

result will be

Row X   Y    
1   1   2    
    3   4      

Note: to overcome current BigQuery's "limitation" for JsonPath, above solution uses custom function along with external library - jsonpath-0.8.0.js that can be downloaded from https://code.google.com/archive/p/jsonpath/downloads and uploaded to Google Cloud Storage - gs://your_bucket/jsonpath-0.8.0.js

Just re-read Felipe's answer - for his example above solution will look like below (just as FYI)

SELECT 
  id, 
  CUSTOM_JSON_EXTRACT(payload, '$.commits[*].author.email') emails
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

2 Comments

I erased .map(x=>JSON.stringify(x)), so that I don't get quotes in the result.
Documentation is also updated for json_extract_array: cloud.google.com/bigquery/docs/reference/standard-sql/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.