3

I have a set of IDs (~200k) and I need to get all the rows in a BigQuery Table with those IDs. I tried to construct a list in python and pass it as a parameter to the SQL query using @ but I get TypeError: 'ArrayQueryParameter' object is not iterable error. Here is the code I tried (very similar to https://cloud.google.com/bigquery/querying-data#running_parameterized_queries):

id_list = ['id1', 'id2'] 
query = """
    SELECT id
    FROM `my-db`
    WHERE id in UNNEST(@ids)
"""
query_job = client.run_async_query(
    str(uuid.uuid4()),
    query,
    query_parameters=(
        bigquery.ArrayQueryParameter('ids', 'ARRAY<STRING>', id_list)
    )
)

4 Answers 4

6

Probably the issue here is that you are not passing a tuple to the function.

Try adding a comma before closing the parenthesis, like so:

id_list = ['id1', 'id2'] 
query = """
    SELECT id
    FROM `my-db`
    WHERE id in UNNEST(@ids)
"""
query_job = client.run_async_query(
    str(uuid.uuid4()),
    query,
    query_parameters=(
        bigquery.ArrayQueryParameter('ids', 'STRING', id_list),
    )
)

In Python if you do:

t = (1)

and then run:

type(t)

You will find the result to be int. But if you do:

t = (1,)

Then it results in a tuple.

Sign up to request clarification or add additional context in comments.

1 Comment

Worked like a charm :) I cannot believe that I spent two days on this! Thank you Will.
2

You need to use 'STRING' rather than 'ARRAY<STRING>' for the array element type, e.g.:

query_parameters=(
    bigquery.ArrayQueryParameter('ids', 'STRING', id_list)

The example from the querying data topic is:

def query_array_params(gender, states):
    client = bigquery.Client()
    query = """
        SELECT name, sum(number) as count
        FROM `bigquery-public-data.usa_names.usa_1910_2013`
        WHERE gender = @gender
        AND state IN UNNEST(@states)
        GROUP BY name
        ORDER BY count DESC
        LIMIT 10;
        """
    query_job = client.run_async_query(
        str(uuid.uuid4()),
        query,
        query_parameters=(
            bigquery.ScalarQueryParameter('gender', 'STRING', gender),
            bigquery.ArrayQueryParameter('states', 'STRING', states)))
    query_job.use_legacy_sql = False

    # Start the query and wait for the job to complete.
    query_job.begin()
    wait_for_job(query_job)
    print_results(query_job.results())

4 Comments

You are also setting query_job.use_legacy_sql = False, right?
Yes, I set it to False.
Ah, I think it might be because query_parameters is expected to be a tuple. What if you put an extra set of parentheses around bigquery.ArrayQueryParameter(...)?
I thought of that too. The single parenthesis around bigquery.ArrayQueryParameter(...) make it a tuple but it seems that tuple elements need to be iterable which are not, in this case, they are instances of ArrayQueryParameter. I tried that with double parenthesis too, no luck.
0

Above answers are a better solution but you may find a use for this too whe quickly drafting something in notebooks:

turn a list into a string of date values, comma-separated and in quotes. Then pass the string into the query like so:

id_list = ['id1', 'id2'] 
# format into a query valid string
id_string = '"'+'","'.join(id_list)+'"'

client = bigquery.Client()
query = f"""
    SELECT id
    FROM `my-db`
    WHERE id in {id_string}
"""
query_job=client.query(query) 
results = query_job.result()

Comments

0

If you want to use the simple query like client.query, not client.run_async_query as shown in the answers above. You can to pass an additional parameter QueryJobConfig. Simply add your arrays to query_parameters using bigquery.ArrayQueryParameter.

The following code worked for me:

    query = f"""
                SELECT distinct pipeline_commit_id, pipeline_id, name
                FROM `{self.project_id}.{self.dataset_id}.pipelines_{self.table_suffix}`,
                UNNEST(labels) AS label
                where label.value IN UNNEST(@labels)
           """
    job_config = bigquery.QueryJobConfig(
        query_parameters=[
            bigquery.ArrayQueryParameter('labels', 'STRING', labels)
        ]
    )
    query_job = self.client.query(query, job_config=job_config)

Based on those examples: https://cloud.google.com/bigquery/docs/parameterized-queries

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.