So I have a raw table with 2 columns:
id (INT64) | content (STRING)
------------|--------------------
1 | {"photos": [{"location": {"lat": 111, "lon": 222}, "ts": "2019-12-16", "uri": "aaa"}, {"location": {"lat": 333, "lon": 444}, "ts": "2019-12-17", "uri": "bbb"}]}
------------|--------------------
2 | ....
First column is an integer-typed id, second column is json-formatted string. An example json looks like this:
{
"photos": [
{
"location": {
"lat": 111,
"lon": 222
},
"ts": "2019-12-16",
"uri": "aaa"
},
{
"location": {
"lat": 333,
"lon": 444
},
"ts": "2019-12-17",
"uri": "bbb"
}
]
}
Question
How can I format the photos from the raw table into an array of structs/records, i.e. resulting in something like this?
id | photos.ts | photos.uri | photos.location.lat | photos.location.lon
-------|---------------|-------------|-----------------------|--------------------
1 | 2019-12-16 | aaa | 111 | 222
| 2019-12-17 | bbb | 333 | 444
-------|---------------|-------------|-----------------------|--------------------
2 | ... | ... | ... | ...
Thoughts
JSON_EXTRACT(content, "$.photos")seems to be a good start as it would give me a JSON object array, then I'd need some JS UDF to format the result into BQSTRUCT/RECORDtype. Not sure exactly how to do that though -- any help is appreciated!- I'm not sure if this "cleanup" into
STRUCT/RECORDis really necessary or worth it. It seems that I can just format photos into an array ofSTRING:
id (INT64) | photos (STRING)
------------|--------------------
1 | {"location": {"lat": 111, "lon": 222}, "ts": "2019-12-16", "uri": "aaa"}
| {"location": {"lat": 333, "lon": 444}, "ts": "2019-12-17", "uri": "bbb"}
------------|--------------------
2 | ....
, then use JSON_EXTRACT/JSON_EXTRACT_SCALAR in my analytical queries. How big a performance sacrifice would I expect?
Thanks!