0

I have a query in Google BigQuery that is working just fine in the BigQuery Web-UI (which I use for testing) but fails to return any results when run in Python / Pandas.

BigQuery Web-UI:

SELECT COUNT(foo) AS Foo FROM `dataset.table` WHERE 
REGEXP_CONTAINS(LOWER(title),r'(\b(bar)\b)')

returns:

Row Foo  
1   4087

in Python / Pandas:

import pandas as pd
project_id = 'some_id'
query = """SELECT COUNT(foo) AS Foo FROM `dataset.table` WHERE 
REGEXP_CONTAINS(LOWER(title),r'(\b(bar)\b)')"""

df=pd.read_gbq(query, project_id, dialect='standard')

returns:

print(df)
    Foo
0   0

This is by far not the most complex regex query I am using, and so far using the Web-UI to test queries worked like a charm.

Any idea what I am doing wrong here?

---EDIT---

I was able to recreate the issue using the public Shakespeare dataset, which I copied into my existing project as 'sample:shakespeare_copy' to avoid possible authentication issues.

GBQ Web-UI:

SELECT COUNT(word_count) AS Foo FROM `bigquery-public- 
data.samples.shakespeare` WHERE 
REGEXP_CONTAINS(LOWER(corpus),r'(\b(sonnets)\b)')

returns:

Row Foo  
1   3677    

Python / Pandas:

import pandas as pd
project_id = 'some_id'
query="""SELECT COUNT(word_count) AS Foo FROM sample.shakespeare_copy 
WHERE REGEXP_CONTAINS(LOWER(corpus),r'(\b(sonnets)\b)')"""
df=pd.read_gbq(query,project_id,dialect='standard')

returns:

print(df)
     Foo
0    0

When running a query without REGEXP_CONTAINS() everything works as intended.

query="""SELECT COUNT(word_count) AS Foo FROM sample.shakespeare_copy"""
df=pd.read_gbq(query,projectid,dialect='standard')

print(df)
    Foo
0  164656

Many Thanks in advance!
Stephan

4
  • 2
    My first guess is that you are reading from the wrong project. Commented Jun 29, 2018 at 10:39
  • Thanks Gordon, I checked that and also ran some other queries, all them working as intended. I have a strong suspicion that the issue has to do with REGEX_CONTAINS(). I did follow the documentation, though, and all other regex queries work as intended, so I remain puzzled. Commented Jun 29, 2018 at 11:29
  • 1
    . . My second guess is the back quotes in the regular expression. Try using a simpler regular expression. Commented Jun 29, 2018 at 12:07
  • Thanks Gordon, the back quotes were indeed the issue: stackoverflow.com/questions/41070708/… Commented Jun 29, 2018 at 12:34

1 Answer 1

2

Solution

As per this entry it is necessary to use double backslashes when writing SQLs in Python. Lesson learned. Using r('\\b(bar)\\b') gets the desired result.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.