0

I am trying to write a cloud function that loads GCS files that match a wildcard into BigQuery. All the samples i can see and in the api reference are load method which specify a specific filename via bigquery.dataset.table.load method.

What method can I use to trigger a file load using wildcards? I know i can specify the metadata source uri but cant see a load method example to go along. Any help appreciated.

Here is my code.

exports.importReadyToIngestFiles = (event, context) => {
  const pubsubMessage = event.data;
  console.log(Buffer.from(pubsubMessage, 'base64').toString());
  const bucketName = Buffer.from(pubsubMessage, 'base64').toString();

    const {BigQuery} = require('@google-cloud/bigquery');
  const {Storage} = require('@google-cloud/storage');

//specify projectID and bigquery datasetID below
  const projectId = "test-3";
  const datasetId = "Data";
  const filename = "p_*";

  const gcsFile = `${bucketName}/p_*`;  

    const tableId = "numeric";

  const bigquery = new BigQuery({
    projectId: projectId,
  });

  const storage = new Storage({
    projectId: projectId,
  });

  let job;

  // Loads data from a Google Cloud Storage file into the table
  bigquery
    .dataset(datasetId)
    .table(tableId)
    .load(storage.bucket(bucketName).file(filename))
    .then(results => {
      job = results[0];
      console.log(`Job ${job.id} started.`);

      // Wait for the job to finish
      return job;
    })
    .then(metadata => {
      // Check the job's status for errors
      const errors = metadata.status.errors;
      if (errors && errors.length > 0) {
        throw errors;
      }
    })
    .then(() => {
      console.log(`Job ${job.id} completed.`);
    })
    .catch(err => {
      console.error('ERROR:', err);
    });

  // callback();

};

UPDATE: Aah, I figured out by looking at BigQuery job history error that my file is pipeline delimited so there was a column mismatch. Now, where can I find how to specify the delimiter? Is it just me or the nodejs bigquery api documentation is a bit unclear.

4
  • have you tried replacing the filename/string argument with the wildcard operator e.g. "gs://my_bucket/*.csv"? Commented Jan 21, 2020 at 10:27
  • Yes. That did not pick up the files. I did not get an error either. Commented Jan 21, 2020 at 13:09
  • It should work. Are you sure the problem isn't something else? Have a look at the log for the actual BigQuery load job that is triggered and see what is says.. Commented Jan 21, 2020 at 23:07
  • @AIKDO Did you solve? Commented Feb 5, 2020 at 10:36

1 Answer 1

1

As you mentioned in the initial question, I suppose that there is an option to supply sourceUris[] field inside the load Job within metadata parameters to leverage wildcard search and that was already confirmed in JobConfigurationLoad documentation:

sourceUris[]

The fully-qualified URIs that point to your data in Google Cloud. For Google Cloud Storage URIs: Each URI can contain one '*' wildcard character and it must come after the 'bucket' name.

Specifying metadata as input values for .load class, something like this:

const metadata = {
    sourceUris: [gs://my_bucket*],
    ...
  };

bigquery
    .load(metadata)

You can also adopt fieldDelimiter field for metadata in order to adjust a column separator for the input data file.

The source code for createLoadJob class you may find here.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.