2

From the example given by Google, I have managed to load CSV files into BigQuery(BQ) table following the guide(link and code below)
Now I want to add several files into BQ, and want to add a new column filename which contains the filename.

Is there a way to add column with default data?

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv

// Import the Google Cloud client libraries
const {BigQuery} = require('@google-cloud/bigquery');
const {Storage} = require('@google-cloud/storage');

// Instantiate clients
const bigquery = new BigQuery();
const storage = new Storage();

/**
 * This sample loads the CSV file at
 * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv
 *
 * TODO(developer): Replace the following lines with the path to your file.
 */
const bucketName = 'cloud-samples-data';
const filename = 'bigquery/us-states/us-states.csv';

async function loadCSVFromGCS() {
  // Imports a GCS file into a table with manually defined schema.

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = 'my_dataset';
  // const tableId = 'my_table';

  // Configure the load job. For full list of options, see:
  // https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad
  const metadata = {
    sourceFormat: 'CSV',
    skipLeadingRows: 1,
    schema: {
      fields: [
        {name: 'name', type: 'STRING'},
        {name: 'post_abbr', type: 'STRING'},
//      {name: 'filemame', type: 'STRING', value=filename} // I WANT TO ADD COLUMN WITH FILE NAME HERE
      ],
    },
    location: 'US',
  };

  // Load data from a Google Cloud Storage file into the table
  const [job] = await bigquery
    .dataset(datasetId)
    .table(tableId)
    .load(storage.bucket(bucketName).file(filename), metadata);

  // load() waits for the job to finish
  console.log(`Job ${job.id} completed.`);

  // Check the job's status for errors
  const errors = job.status.errors;
  if (errors && errors.length > 0) {
    throw errors;
  }
}

3 Answers 3

2

According to BigQuery’s documentation [1], there is no option to set a default value for columns. The closest option without any post-processing, would be to use a NULL value for nullable columns.

However, a possible postprocessing workaround for this would be to create a View of the raw table and add a script that maps the NULL value to any default value. Here’s some information about scripting in BigQuery [2].

In case it is possible to add a pre-processing code, adding the value to the source file would be easy to achieve using any scripting language.

I think that static and function-based values will be a good feature for BigQuery’s future scope.

[1] - https://cloud.google.com/bigquery/docs

[2] - https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting

Sign up to request clarification or add additional context in comments.

Comments

1

I would say you have a few choices.

  1. Add a column to the CSV before uploading, e.g. with awk or preprocessing in JS.
  2. Add the individual CSV files to separate tables. You can easily query across many tables as one in BigQuery. This way you can easily see what data comes from which file, and you can access table meta data for the file-name
  3. Post process the data, by adding the column after the data is loaded with normal sql/api calls.
  4. See also this possible duplicate How to add new column with metadata value to csv when loading it to bigquery

1 Comment

Thank you. I'm gonna go with option 2.
0

You have multiple options:

  1. you could rebuild your CSV with the filename as a column data
  2. you can load data into a temporary table, then moving to final table with a second step specifying the missing file name column
  3. convert the example to be an external table where _FILE_NAME is a pseduocolumn, and later you can query and move to a final table. See more about this here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.