1

I'm reading data from Google BigQuery into a polars dataframe. Using a string query succeeds. I'd prefer to use an alchemy statement. Using python-bigquery-sqlalchemy provided by Google and following their SDK instructions fails.

This Method Works

import polars as pl
from sqlalchemy import create_engine

project = "my-project-name"
schema = "bigquery-public-data"
dataset = "pypi"
table = "file_downloads"

full_path = f"{schema}.{dataset}.{table}"
engine = create_engine(f"bigquery://{project}")

query = f"SELECT * FROM {full_path} LIMIT 100"
df = pl.read_database(query = query, connection=engine)
df # prints output

This Method Fails

I followed the instructions linked above. Google appear to use an older version of alchemy, where it is not necessary to pass the MetaData object when instantiating a Table. So I've added that in.

from sqlalchemy import MetaData, select, Table

meta_data = MetaData()

sample_table = Table(
    'file_downloads', 
    meta_data,
    schema = f"{schema}.{dataset}", 
)

query = select(sample_table).limit(100)
df = pl.read_database(query, connection = engine)

Returns error:

DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/my-project-name/queries?prettyPrint=false: Syntax error: SELECT list must not be empty at [2:1] [SQL: SELECT FROM bigquery-public-data.pypi.file_downloads]

I believe the alchemy has failed to synchronise the table object with the database. Normally when I use alchemy to query a database where I'm not also using it to define the schema, I'd perform:

conn = engine.connect()
meta_data.reflect(conn)
sample_table = meta_data.tables['file_downloads']

That is not realistic here. It took more than an hour before I killed it. I presume it was synchronising the entire public datasets catalogue, which I don't want.

My Versions

  • Python 3.13.1
  • polars == 1.24.0
  • sqlalchemy == 2.0.38
  • sqlalchemy-bigquery == 1.12.1

What's missing? I'm going around in circles.

2
  • 2
    Have you tried sample_table = Table('file_downloads', … , autoload_with=engine) ? Commented Mar 7 at 14:01
  • Thanks that solved it. Looks like the autoload param has been deprecated in favour of this. Commented Mar 8 at 2:51

1 Answer 1

1

As mentioned by @Gord Thompson in comment, the above error will be solved by the below code.

sample_table = Table( 
      'file_downloads', 
       meta_data, 
       schema=f"{schema}.{dataset}", 
       autoload_with=engine 
)

Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.

Feel free to edit this answer for additional information.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.