I have a query in Google BigQuery that is working just fine in the BigQuery Web-UI (which I use for testing) but fails to return any results when run in Python / Pandas.
BigQuery Web-UI:
SELECT COUNT(foo) AS Foo FROM `dataset.table` WHERE
REGEXP_CONTAINS(LOWER(title),r'(\b(bar)\b)')
returns:
Row Foo
1 4087
in Python / Pandas:
import pandas as pd
project_id = 'some_id'
query = """SELECT COUNT(foo) AS Foo FROM `dataset.table` WHERE
REGEXP_CONTAINS(LOWER(title),r'(\b(bar)\b)')"""
df=pd.read_gbq(query, project_id, dialect='standard')
returns:
print(df)
Foo
0 0
This is by far not the most complex regex query I am using, and so far using the Web-UI to test queries worked like a charm.
Any idea what I am doing wrong here?
---EDIT---
I was able to recreate the issue using the public Shakespeare dataset, which I copied into my existing project as 'sample:shakespeare_copy' to avoid possible authentication issues.
GBQ Web-UI:
SELECT COUNT(word_count) AS Foo FROM `bigquery-public-
data.samples.shakespeare` WHERE
REGEXP_CONTAINS(LOWER(corpus),r'(\b(sonnets)\b)')
returns:
Row Foo
1 3677
Python / Pandas:
import pandas as pd
project_id = 'some_id'
query="""SELECT COUNT(word_count) AS Foo FROM sample.shakespeare_copy
WHERE REGEXP_CONTAINS(LOWER(corpus),r'(\b(sonnets)\b)')"""
df=pd.read_gbq(query,project_id,dialect='standard')
returns:
print(df)
Foo
0 0
When running a query without REGEXP_CONTAINS() everything works as intended.
query="""SELECT COUNT(word_count) AS Foo FROM sample.shakespeare_copy"""
df=pd.read_gbq(query,projectid,dialect='standard')
print(df)
Foo
0 164656
Many Thanks in advance!
Stephan