1

Alternative heading: Subsetting a table before extracting a JSON element

I need to subset a very large table on BigQuery. The column that I will be filtering (joining) on to achieve this subsetting is not a JSON array. However, I would like to include/extract a complimentary column from a JSON array afterwards. No matter how I rearrange my query, it seems to process the full (i.e. non-subsetted) table when I include the extracted JSON element.

As a MWE, consider a query that I'm adapting/borrowing from @felipe-hoffa here:

SELECT id
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

This query processes 33.9 MB.

However, if I add an extracted column from the JSON array (which again, I'm not subsetting on):

SELECT id, JSON_EXTRACT_SCALAR(payload, '$.size') AS size
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

... then the process figure jumps to 3.5 GB (i.e. it is querying the whole table).

Any idea on how I could do this more efficiently and keep down per-query costs?

2
  • 1
    as soon as you touching tat column payload - you pay for it even though you use only tiny piece of it! The only way is to consider partitioning / clustering ... Commented Apr 6, 2021 at 21:40
  • Thanks @MikhailBerlyant, that was my fear! Do you mind moving your comment to an answer and I'll mark it as the accepted solution? Commented Apr 6, 2021 at 22:23

1 Answer 1

2

as soon as you touching that column payload - you pay for it even though you use only tiny piece of it! The only way is to consider partitioning / clustering ...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.