Efficient BigQuery'ing when selecting/extracting a JSON element

Question

Alternative heading: Subsetting a table before extracting a JSON element

I need to subset a very large table on BigQuery. The column that I will be filtering (joining) on to achieve this subsetting is not a JSON array. However, I would like to include/extract a complimentary column from a JSON array afterwards. No matter how I rearrange my query, it seems to process the full (i.e. non-subsetted) table when I include the extracted JSON element.

As a MWE, consider a query that I'm adapting/borrowing from @felipe-hoffa here:

SELECT id
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

This query processes 33.9 MB.

However, if I add an extracted column from the JSON array (which again, I'm not subsetting on):

SELECT id, JSON_EXTRACT_SCALAR(payload, '$.size') AS size
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

... then the process figure jumps to 3.5 GB (i.e. it is querying the whole table).

Any idea on how I could do this more efficiently and keep down per-query costs?

as soon as you touching tat column payload - you pay for it even though you use only tiny piece of it! The only way is to consider partitioning / clustering ... — Mikhail Berlyant
– Mikhail Berlyant, Commented Apr 6, 2021 at 21:40
Thanks @MikhailBerlyant, that was my fear! Do you mind moving your comment to an answer and I'll mark it as the accepted solution? — Grant
– Grant, Commented Apr 6, 2021 at 22:23

Mikhail Berlyant · Accepted Answer · 2021-04-06 22:46:35Z

2

as soon as you touching that column payload - you pay for it even though you use only tiny piece of it! The only way is to consider partitioning / clustering ...

answered Apr 6, 2021 at 22:46

Mikhail Berlyant

174k10 gold badges173 silver badges251 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Efficient BigQuery'ing when selecting/extracting a JSON element

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related