1

I'm trying to query a google bigquery table using the regex from this blog post. Here it is, slightly modified:

pd\.([^”,\.\(\,\`) \’:\[\]\/\\={}]*)

regex101 example of its usage

It does not, however, work in my google bigquery python client SQL query:

query_results = client.run_sync_query(
"""
SELECT
  REGEXP_EXTRACT(SPLIT(content, '\n'),
                 r'pd\.([^”,\.\(\,\`) \’:\[\]\/\\={}]*)')
FROM
  [fh-bigquery:github_extracts.contents_py]
LIMIT 10
""")

query_results.run()

data = query_results.fetch_data()
data

BadRequest: BadRequest: 400 Failed to parse regular expression "pd.([^”,.(\,`) \’:[]/\={}]*)": invalid escape sequence: \’

0

1 Answer 1

2

The problem here is that BigQuery uses re2 library for its regex operations.

If you try the same regex but using the golang flavor you will see the exact same error (golang also uses re2).

So maybe if you just remove the escaping of the ' character you'll already have it working for you (as I tested here it seemed to work properly).

Another issue that you might find is that the result of the SPLIT operation is an ARRAY. That means that BigQuery won't process your query saying that the signature of REGEXP_EXTRACT does not allow ARRAY<STRING> as input. You could use REGEXP_REPLACE instead:

"""
SELECT
  REGEXP_EXTRACT(REGEXP_REPLACE(content, r'.*(\\n)', ''),
                 r'pd\.([^”,\.\(\,\`) ’:\[\]\/\\={}]*)')
FROM
  [fh-bigquery:github_extracts.contents_py]
LIMIT 10
"""

The character "\n" is replaced by "" in this operation and the result is a STRING.

Sign up to request clarification or add additional context in comments.

1 Comment

Cheers @Will, I appreciate the solution and explanation. The REGEXP_EXTRACT appears to be working on the array, but I will experiment your method should I find that error.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.