From the official documentation, we can see that it loads the table into Spark DataFrame first and then perform query with .sql().
words = spark.read.format('bigquery') \
.option('table', 'bigquery-public-data:samples.shakespeare') \
.load()
words.createOrReplaceTempView('words')
# Perform word count.
word_count = spark.sql(
'SELECT word, SUM(word_count) AS word_count FROM words GROUP BY word')
word_count.show()
word_count.printSchema()
Can I do something similar by loading the table according to the query result? Here is my code that loading BigQuery result to Pandas DataFrame.
sql_query = 'Some Queries'
credentials, project = google.auth.default(scopes=[
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/bigquery',
])
client = bigquery.Client(credentials=credentials, project=project)
df = client.query(sql_query).to_dataframe()
I knew that we can convert Pandas DataFrame to Spark DataFrame. I am looking for a cleaner and faster way.