Error while trying to append data to a BigQuery table using pandas data frame

Question

I have a pandas data frame that looks like this:

It has 6 columns in it. I tried appending it to an existing table in BigQuery with the same schema with this:

import os
from google.cloud import bigquery

# Login credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="secret.json"

# Initialize big query
client = bigquery.Client()

# Table information
project = "xxxxxxxx"
dataset = "Vahan"
table = "rto_data"
table_id = '{}.{}.{}'.format(project, dataset, table)

# Setup for upload
job_config = bigquery.LoadJobConfig()

# Define the table schema
schema = [bigquery.SchemaField(name='State', field_type='STRING', mode='NULLABLE'),
          bigquery.SchemaField(name='RTO', field_type='STRING', mode='NULLABLE'),
          bigquery.SchemaField(name='Registration_Number', field_type='STRING', mode='NULLABLE'),
          bigquery.SchemaField(name='Maker', field_type='STRING', mode='NULLABLE'),
          bigquery.SchemaField(name='Date', field_type='DATE', mode='NULLABLE'),
          bigquery.SchemaField(name='Registrations', field_type='INTEGER', mode='NULLABLE')]

job_config.create_disposition = "CREATE_IF_NEEDED"


# Make the API request
load_result = client.load_table_from_dataframe(dataframe=df,
                                               destination=table_id, 
                                               job_config=job_config)

# Wait for query to finish working
load_result.result()

# Make an API request.
table = client.get_table(table_id)

# Output
print("Loaded {} rows and {} columns to {}".format(table.num_rows, len(table.schema), table_id))

and I'm getting this error: BadRequest: 400 Provided Schema does not match Table advanced-analytics-123456:Vahan.rto_data. Cannot add fields (field: __index_level_0__)

I put the data in a new table and looks like the query is adding a random new column called __index_level_0__

How do I fix this so that I can append the data to my existing table? Your help would be greatly appreciated!

Just a note to others: CREATE_IF_NEEDED is the default value of job_config.createDisposition (so no need to specify). Importantly, you do need job_config.writeDisposition = 'WRITE_APPEND' to actually append an existing table. For some reason, not present in this question. — weezilla
– weezilla, Commented Dec 29, 2022 at 22:55

Sergey Geron · Accepted Answer · 2021-06-01 06:13:21Z

8

Maybe you have a __index_level_0__ column in the dataframe? Try dropping the index:

df.reset_index(drop=True, inplace=True)

answered Jun 1, 2021 at 6:13

Sergey Geron

10.5k4 gold badges28 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Error while trying to append data to a BigQuery table using pandas data frame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related