I'm reading data from Google BigQuery into a polars dataframe. Using a string query succeeds. I'd prefer to use an alchemy statement. Using python-bigquery-sqlalchemy provided by Google and following their SDK instructions fails.
This Method Works
import polars as pl
from sqlalchemy import create_engine
project = "my-project-name"
schema = "bigquery-public-data"
dataset = "pypi"
table = "file_downloads"
full_path = f"{schema}.{dataset}.{table}"
engine = create_engine(f"bigquery://{project}")
query = f"SELECT * FROM {full_path} LIMIT 100"
df = pl.read_database(query = query, connection=engine)
df # prints output
This Method Fails
I followed the instructions linked above. Google appear to use an older version of alchemy, where it is not necessary to pass the MetaData object when instantiating a Table. So I've added that in.
from sqlalchemy import MetaData, select, Table
meta_data = MetaData()
sample_table = Table(
'file_downloads',
meta_data,
schema = f"{schema}.{dataset}",
)
query = select(sample_table).limit(100)
df = pl.read_database(query, connection = engine)
Returns error:
DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/my-project-name/queries?prettyPrint=false: Syntax error: SELECT list must not be empty at [2:1] [SQL: SELECT FROM
bigquery-public-data.pypi.file_downloads]
I believe the alchemy has failed to synchronise the table object with the database. Normally when I use alchemy to query a database where I'm not also using it to define the schema, I'd perform:
conn = engine.connect()
meta_data.reflect(conn)
sample_table = meta_data.tables['file_downloads']
That is not realistic here. It took more than an hour before I killed it. I presume it was synchronising the entire public datasets catalogue, which I don't want.
My Versions
- Python 3.13.1
- polars == 1.24.0
- sqlalchemy == 2.0.38
- sqlalchemy-bigquery == 1.12.1
What's missing? I'm going around in circles.
sample_table = Table('file_downloads', … , autoload_with=engine)?autoloadparam has been deprecated in favour of this.