1

I have two tables both billing data from GCP in two different regions. I want to insert one table into the other. Both tables are partitioned by day, and the larger one is being written to by GCP for billing exports, which is why I want to insert the data into the larger table.

I am attempting the following:

  1. Export the smaller table to Google Cloud Storage (GCS) so it can be imported into the other region.
  2. Import the table from GCS into Big Query.
  3. Use Big Query SQL to run INSERT INTO dataset.big_billing_table SELECT * FROM dataset.small_billing_table

However, I am getting a lot of issues as it won't just let me insert (as there are repeated fields in the schema etc). An example of the dataset can be found here https://bigquery.cloud.google.com/table/data-analytics-pocs:public.gcp_billing_export_v1_EXAMPL_E0XD3A_DB33F1

Thanks :)


## Update ##

So the issue was exporting and importing the data with the Avro format and using the auto-detect schema when importing the table back in (Timestamps were getting confused with integer types).

Solution

Export the small table in JSON format to GCS, use GCS to do the regional transfer of the files and then import the JSON file into a Bigquery table and DONT use schema auto detect (e.g specify the schema manually). Then you can use INSERT INTO no problems etc.

2
  • 1
    I believe you want to use FULL JOIN the tables and not insert one in another because both should have the same columns (that is one of the possible reasons you are getting repeated fields in the schema error). Also, you can append the smaller table in the bigger table having a joining condition (in case there are repeated rows). Then you would have both data in one table with the same schema. Is that what you want ? If it is I would be glad to help you with the script. Commented Apr 15, 2020 at 8:23
  • 1
    Thanks Alex, I just added an edit. I would like the data to go into the larger table, as that's the table that is receiving the Billing export data from GCP Billing. Commented Apr 15, 2020 at 8:52

1 Answer 1

3

I was able to reproduce your case with the example data set you provided. I used dummy tables, generated from the below queries, in order to corroborate the cases:

Table 1: billing_bigquery

SELECT * FROM `data-analytics-pocs.public.gcp_billing_export_v1_EXAMPL_E0XD3A_DB33F1`  
    where service.description ='BigQuery' limit 1000

Table 2: billing_pubsub

SELECT * FROM `data-analytics-pocs.public.gcp_billing_export_v1_EXAMPL_E0XD3A_DB33F1`  
    where service.description ='Cloud Pub/Sub' limit 1000

I will propose two methods for performing this task. However, I must point that the target and the source table must have the same columns names, at least the ones you are going to insert.

First, I used INSERT TO method. However, I would like to stress that, according to the documentation, if your table is partitioned you must include the columns names which will be used to insert new rows. Therefore, using the dummy data already shown, it will be as following:

INSERT INTO `billing_bigquery` ( billing_account_id, service, sku, usage_start_time, usage_end_time, project, labels, system_labels, location, export_time, cost, currency, currency_conversion_rate, usage, credits  )#invoice, cost_type 
SELECT billing_account_id, service, sku, usage_start_time, usage_end_time, project, labels, system_labels, location, export_time, cost, currency, currency_conversion_rate, usage, credits 
FROM `billing_pubsub`

Notice that for nested fields I just write down the fields name, for instance: service and not service.description, because they will already be used. Furthermore, I did not select all the columns in the target dataset but all the columns I selected in the target's tables are required to be in the source's table selection as well.

The second method, you can simply use the Query settings button to append the small_billing_table to the big_billing_table. In BigQuery Console, click in More >> Query settings. Then the settings window will appear and you go to Destination table, check Set a destination table for query results, fill the fields: Project name, Dataset name and Table name -these are the destination table's information-. Subsequently, in Destination table write preference check Append to table, which according to the documentation:

Append to table — Appends the query results to an existing table

Then you run the following query:

Select * from <project.dataset.source_table>

Then after running it, the source's table data should be appended in the target's table.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.