0

I am attempting to upload a file from Google's BigQuery Python library (google-cloud-bigquery==1.3.0)

Using documentation:

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.skip_leading_rows = 1
job_config.autodetect = True

with open(filename, 'rb') as source_file:
    job = client.load_table_from_file(
        source_file,
        table_ref,
        location='US',
        job_config=job_config)

job.result()

This successfully creates the table and inserts the data, in this case, a simple one-column file of string type.

However, it sets the following column name: string_field_0.

Is there a way I can customize these column names?

2 Answers 2

1

Because you have just one column of only Strings, BigQuery can't work out what the header/column names are even though you are using job_config.skip_leading_rows = 1 in your code i.e it doesn't know the difference between a row and a header. If you had a second column with say, an Integer, then BigQuery would be able to preserve the column names because it can now tell the difference.

As a workaround, either don't use auto_detect and manually specify the schema (it's just one column after all), or load it and then hit it with some SQL after and rename the column.

Sign up to request clarification or add additional context in comments.

Comments

1

You should be able to add a schema for the load job. e.g.

job_config.schema = [SchemaField('columnName', 'STRING', mode='nullable')]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.