1

In order to reduce the number of API calls to the Sheets API and aviod the dreaded 'error 429' message, I wish to utilise the Sheets API 'batchGet' function. I have placed all of my relevant information into one google spreadsheet spreadsheet_id, containing multiple worksheets ranges. The next step is to convert this batchGet request into a Pandas Dataframe.

Here is my code... If anyone can provide guidance on next steps to get this into a pandas df that would be great.

from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd


SCOPES = [ 'https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/spreadsheets']

credentials = ServiceAccountCredentials.from_json_keyfile_name('creds.json', SCOPES)

service = discovery.build('sheets', 'v4', credentials=credentials)

# The ID of the spreadsheet to retrieve data from.
spreadsheet_id = 'my_spreadheet_id'  # TODO: Update placeholder value.

# The A1 notation of the values to retrieve.
ranges = ['2016_IGA!A2:BD',  '2017_IGA!A2:BD',  '2018_IGA!A2:BD',  '2019_IGA!A2:BD',  '2020_IGA!A2:BD',
'2016_Coles!A2:BD',  '2017_Coles!A2:BD',  '2018_Coles!A2:BD',  '2019_Coles!A2:BD',  '2020_Coles!A2:BD',                          # TODO: Update placeholder value.
'2016_WW!A2:BD',  '2017_WW!A2:BD',  '2018_WW!A2:BD',  '2019_WW!A2:BD',  '2020_WW!A2:BD', 
'2018_Aldi!A2:BD',  '2019_Aldi!A2:BD',  '2020_Aldi!A2:BD']

value_render_option = 'FORMATTED_VALUE'  

request = service.spreadsheets().values().batchGet(spreadsheetId=spreadsheet_id, ranges=ranges, valueRenderOption=value_render_option)
response = request.execute()
1
  • In order to correctly understand your question, can you provide a sample Spreadsheet and sample output you want? Of course, please remove your personal information from them. Commented Jul 15, 2019 at 23:19

2 Answers 2

2

building on @juan Morais comments with a few adaptions of my own, here is the final solution..

from googleapiclient import discovery
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd
from pandas.io.json import json_normalize


SCOPES = [ 'https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/spreadsheets']

credentials = ServiceAccountCredentials.from_json_keyfile_name('creds.json', SCOPES)

service = discovery.build('sheets', 'v4', credentials=credentials)

# The ID of the spreadsheet to retrieve data from.
spreadsheet_id = 'my_spreadheet_id' 

# The A1 notation of the values to retrieve.
ranges = ['2016_IGA!A2:Be',  '2017_IGA!A2:Be',  '2018_IGA!A2:Be',  '2019_IGA!A2:Be',  '2020_IGA!A2:Be',
'2016_Coles!A2:Be',  '2017_Coles!A2:Be',  '2018_Coles!A2:Be',  '2019_Coles!A2:Be',  '2020_Coles!A2:Be',                          # TODO: Update placeholder value.
'2016_WW!A2:Be',  '2017_WW!A2:Be',  '2018_WW!A2:Be',  '2019_WW!A2:Be',  '2020_WW!A2:Be', 
'2018_Aldi!A2:Be',  '2019_Aldi!A2:Be',  '2020_Aldi!A2:Be']

value_render_option = 'FORMATTED_VALUE'  

request = service.spreadsheets().values().batchGet(spreadsheetId=spreadsheet_id, ranges=ranges, valueRenderOption=value_render_option,majorDimension='ROWS')
response = request.execute()

sheet_values = response.get('valueRanges', [])

df = json_normalize(sheet_values, sep = ",",record_path='values')
Sign up to request clarification or add additional context in comments.

Comments

1

You have to get the values from the response, and then create a DataFrame from the resulting list.

sheet_values = response.get('values', [])

# Optional: Perform any data cleaning/wrangling operations (Date/currency conversion)

# Create a dataframe with the extracted values
df_sheet = DataFrame(sheet_values, columns=['A', 'B', 'C'])

4 Comments

thanks Juan but when I do this, my df_sheet returns empty. In my spreadsheet, the 'values' are values derrived from formulas - would this have an impact?
I believe I also ran into this issue when I had to implement GA on my side. Because I have to do data cleaning, I ended up appending values into a different list, which parsed completely fine. I used the code provided by the official Google API Docs - Let me know if this helps!
ok, I have solved the problem. I needed to actually have sheet_values = response.get('valueRanges', []) and then normalise this json reponse using the following. . df = json_normalize(sheet_values, sep = ",",record_path='values') Works perfectly
Ah, interesting! I'll be sure to write this down should I find this problem in the future. Glad I was able to help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.