14

I currently have an app written in appscript to import some CSV files from cloud storage into bigquery. While this is pretty simple, I am forced to specify the schema for the destination table.

What I am looking for is a way to read the CSV file and create the schema based on the column names in the first row. It is okay if all the variable types end up as strings. I feel like this is a pretty common scenario.. does anyone have any guidance on this?

Much thanks, Nick

2
  • It's been more than three years since this question asked, Is there any direct BigQuery API method available now to set the schema from an external source or load the CSV without a schema? Commented Aug 18, 2017 at 1:05
  • OMG, when I think that BigQuery allows Auto-Detect, and can't even infer or ask if column names should be the first line... As of August 2023, what a basic feature missing... Commented Aug 10, 2023 at 10:03

4 Answers 4

5
+50

One option (not a particularly pleasant one, but an option) would be to make a raw HTTP request from apps script to GCS to read the first row of the data, split it on commas, and generate a schema from that. GCS doesn't have apps script integration, so you need to build the requests by hand. Apps Script does have some utilities to let you do this (as well as OAuth), but my guess is that is is going to be a decent amount of work to get right.

There are also a couple of things you could try from the BigQuery side. You could import the data to a temporary table as a single field (set the field delimiter to something that doesn't exist, like '\r'). You can read the header row via tabledata.list() (i.e. the first row of the temporary table). You can then run a query that splits up then split the single field up into columns with a regular expression, and set allow_large_results and a destination table.

One other option would be to use a dummy schema with more columns than you'll ever have, then use the allow_jagged_rows option to allow rows that are missing data at the end of the row. You can then read the first row (similar to the previous option) with tabledata.list() and figure out how many rows are actually present. Then you could generate a query that rewrites the table with correct column names. The advantage of this approach is that you don't need regular expressions or parsing; it lets bigquery do all of the CSV parsing.

There is a downside to both of the latter two approaches, however; the bigquery load mechanism does not guarantee to preserve ordering of your data. In practice, the first row should always be the first row in the table, but that isn't guaranteed to always be true.

Sorry there isn't a better solution. We've had a feature request on the table for a long time to auto-infer schemas; I'll take this as another vote for it.

Sign up to request clarification or add additional context in comments.

2 Comments

And if i want to load the all text file as one big string into one row with big string column?
is this answer still relevant in 2019?
2

For the record, schema inference is now available: https://cloud.google.com/bigquery/federated-data-sources#auto-detect

Comments

2

Was facing the same issues when all my columns were of String datatype, when I added one more column (any random column) as an integer datatype, it worked. Used the option of "Auto-detect Schema" and in the Advanced Option-> Header rows to skip as 1

Comments

1

Building off of William Vambenepe's answer, Big Query can guess at the schema now. The documentation page moved to: https://cloud.google.com/bigquery/docs/schema-detect

Note that your import can still fail, as it only looks at the first 100 rows. This can be problematic if you have a rare "NA" or "Other" in a column of seeming integers.

When this feature first came out, you could go back and change the offending Field Type on the Web UI by hand because the guesses would auto-populate the schema when you reload the failed import. It doesn't seem to do this anymore, hopefully it will return in a future update.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.