2

I'm using Python 2.7 and the Google Cloud Client Library for Python (v0.27.0) to insert data into a BigQuery table (using table.insert_data()).

One of the fields in my table has type 'DATE'.

In my Python script I've formatted the date-data as 'YYYY-MM-DD', but unfortunately the Google Cloud library returns an 'Invalid date:' error for that field.

I've tried formatting the date-field in many ways (i.e. 'YYYYMMDD', timestamp etc.), but no luck so far...

Unfortunately the API docs (https://googlecloudplatform.github.io/google-cloud-python/latest/) don't mention anything about the required date format/type/object in Python.

This is my code:

from google.cloud import bigquery
import pandas as pd
import json
from pprint import pprint
from collections import OrderedDict

# Using a pandas dataframe 'df' as input

# Converting date field to YYYY-MM-DD format

df['DATE_VALUE_LOCAL'] = df['DATE_VALUE_LOCAL'].apply(lambda x: x.strftime('%Y-%m-%d'))

# Converting pandas dataframe to json

json_data = df.to_json(orient='records',date_format='iso')

# Instantiates a client
bigquery_client = bigquery.Client(project="xxx")

# The name for the new dataset
dataset_name = 'dataset_name'
table_name = 'table_name'

def stream_data(dataset_name, table_name, json_data):
    dataset = bigquery_client.dataset(dataset_name)
    table = dataset.table(table_name)

    data = json.loads(json_data, object_pairs_hook=OrderedDict)

    # Reload the table to get the schema.
    table.reload()

    errors = table.insert_data(data)

    if not errors:
        print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
    else:
        print('Errors:')
        pprint(errors)

stream_data(dataset_name, table_name, json_data)

What is the required Python date format/type/object to insert my dates into a BigQuery DATE field?

4
  • This is weird. As it is it should have worked...what version of the python client are you using? Could you share your script as well? Commented Oct 31, 2017 at 18:26
  • I'm using v0.28.0 of the Python client. I've updated my question with a code sample. Thnx! Commented Nov 1, 2017 at 13:33
  • You're probably using v0.27.0 as in 0.28 the method insert_data no longer exists Commented Nov 1, 2017 at 13:43
  • Ah, there's been an update to the client 21 hours ago.. You're absolutely right, I was using 0.27.0. I'll rewrite my code to make it work on 0.28.0 and check if the issue will be fixed Commented Nov 1, 2017 at 14:16

1 Answer 1

2

I just simulated your code here and everything worked fine. Here's what I've simulated:

import pandas as pd
import json
import os
from collections import OrderedDict
from google.cloud.bigquery import Client

d = {'ed': ['3', '5'],
     'date': ['2017-10-11', '2017-11-12']}
json_data = df.to_json(orient='records', date_formate='iso')
json_data = json.loads(json_data, object_pairs_hook=OrderedDict)

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/key.json'
bc = Client()
ds = bc.dataset('dataset name')
table = ds.table('table I just created')
table = bc.get_table(table)
bc.create_rows(table, json_data)

It's using version 0.28.0 but still it's the same methods from previous versions.

You probably have some mistake going on in some step that maybe is converting date to some other unidentifiable format for BQ. Try using this script as reference to see where the mistake might be happening in your code.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.