Bigquery clustering not reducing query costs

Question

Im having an issue with clustered tables in BigQuery (with date partitions). I have a table that is clustered by a column named entity_id. The thing is, i expect to see a bytes read reduction when making queries filtered by these clustered column, but according with the BigQuery Web UI it's doing a fullscan anyway.

For example:
SELECT * FROM project.usersDataset.users_cluster WHERE entity_id = '405849241' LIMIT 1000;
Returns: "Query complete (0.570 sec elapsed, 862.94 MB processed)"
This is actually the full table size (862,94 MB)

This is the table configuration: Table configuration img

EDIT: I keep going on tests and i found that sometimes, some bytes read are saved, but not too much:
Query from BigQuery Web Ui I was expecting a bigger amount of bytes cost to be saved (returned 1 entry and scanned 719MB of 862MB of the table) but nothing guaranteed these in the bigquery documentation.

Does anyone have a clue on what could be happening?
Thanks!

I asked a similar question in this link stackoverflow.com/questions/53980953/…, Can you also provide same screenshot from your web UI to help get to the bottom of this — Tamir Klein
– Tamir Klein, Commented Feb 19, 2019 at 18:00
Clustering is working only on partition tables, and kicks in generally above 1GB of data sets. — Pentium10
– Pentium10, Commented Feb 19, 2019 at 18:08
Yes Tamir, it's quite similar to your problem. Actually (as i edited the post recently) i continue testing and i found that sometimes some bytes cost reduction is been made (719MB of 862MB of the total table and returned 1 row). I suppose i was expecting a bigger cost save, but nothing guaranteed these in the bigquery documentation and as Pentium10 points out maybe the amount of data doesn't help neither. Thanks both! — Marco Lotto
– Marco Lotto, Commented Feb 19, 2019 at 19:02

Community · Accepted Answer · 2020-06-20 09:12:55Z

0

From BigQuery documentation provided in this link

Features under development

Support for clustering non-partitioned tables.

Please check you table is cluster and partition

Note: Cluster will also be used when no WHERE condition per BigQuery documentation

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Feb 19, 2019 at 18:05

Tamir Klein

3,6801 gold badge26 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Marco Lotto Over a year ago

I forgot to add, it's a partitioned by date table clustered by entity_id field. I have recently edited the post to add the table specs. I have a doubt about your note, i thought you actually need the clustered field in the query for this to work

Tamir Klein Over a year ago

Check Felipe great article about cluster medium.com/google-cloud/… see in his example the where part is on the partition only and how the cluster saves cost, this was my intention sorry if it wasn't clear. Hope this document will solve your issue.

Tamir Klein Over a year ago

@MarcoLotto, I posted another question on a similar issue which you can find in this link. Note if you have a streaming buffer attach to your table you might need to run a daily merge command to see cost improvments

Collectives™ on Stack Overflow

Bigquery clustering not reducing query costs

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related