0

I used to have a working code. It reads a google sheet into pandas and then does stuff with the data.

It worked very well, until recently.

The google sheet has around 450 lines, but it seems the google api rejects the request now. Can anybody help with what is wrong here?

import pandas as pd

google_id = r"my_google_id"
front_boilerplate = r"https://docs.google.com/spreadsheets/d/"
back_boilerplate = r"/export?format=xlsx"

sheet_name  = front_boilerplate + google_id + back_boilerplate
df = pd.read_excel(sheet_name)

I get "Bad Request" as result. Which is strange, because last time I executed the code (a month ago) it worked perfectly well. Was there any change in pandas or the google API which leads to the rejection of this code?

For anybody wanting to see the stack trace:

Traceback (most recent call last):

  File "card_printer.py", line 1040, in <module>
    cards = read_cards(google_sheet)

  File "card_printer.py", line 90, in read_cards
    in_pd = read_excel(file, engine='openpyxl')

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 480, in __init__
    super().__init__(filepath_or_buffer)

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 342, in __init__
    filepath_or_buffer = BytesIO(urlopen(filepath_or_buffer).read())

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\common.py", line 137, in urlopen
    return urllib.request.urlopen(*args, **kwargs)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 531, in open
    response = meth(req, response)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 563, in error
    result = self._call_chain(*args)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 531, in open
    response = meth(req, response)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Bad Request
9
  • 1
    Does this answer your question? wget/curl large file from google drive Commented Mar 7, 2021 at 6:56
  • The above like has multiple ways to down. Check the one using Python. Have you tried downloading another file? Commented Mar 7, 2021 at 6:57
  • 1
    In your case, if you are the owner of the Spreadsheet, when you change the shared condition to "Restricted" and click "Done" button, and then, when you shared publicly as "Anyone with the link" and test it again, what result will you obtain? If this was not the direct solution of your issue, when you copy the Spreadsheet and publicly share the copied Spreadsheet and test it again, what result will you obtain? If these were not the solution of your issue, I apologize. Commented Mar 7, 2021 at 7:24
  • 1
    You have to read a google sheet using gspread to authenticate and then make a DataFrame from that sheet. Making a DataFrame from a google sheets' url is trying to skip authentication and shouldn't work. Commented Mar 7, 2021 at 16:50
  • 1
    @Anderas Maybe since you're using jupyter it works because you're signed into your google account in browser. If not, I would assume this is some sort of bug. I'll add an answer on how I normally read google sheets for context. Commented Mar 7, 2021 at 18:08

2 Answers 2

2

You can always use the API

jupyter notebook

from google import drive
mydrive = drive.drive()
mydrive.sheetdf(mydrive.findItem("Blood pressure",mime="application/vnd.google-apps.spreadsheet"),  range="Sheet1!A1:D" )

drive.py

from apiclient import discovery
from oauth2client import client, tools
from oauth2client.file import Storage
from pathlib import Path
import pandas as pd

# install pre-requisites
# pip3 install httplib2
# pip3 install google-api-python-client
# pip3 install oauth2client

# setup / download client secret file here
# https://console.developers.google.com/apis/credentials

SCOPES = ['https://www.googleapis.com/auth/spreadsheets', 'https://www.googleapis.com/auth/drive']

class drive(object):
    driveservice = None
    sheetservice = None

    def __init__(self, appname="googleclient") -> None:
        self.__connectToService(appname=appname)

    def __connectToService(self, appname=""):
        credentials = self.__get_credentials(appname=appname)
        self.driveservice = discovery.build('drive', 'v3', credentials=credentials)
        self.sheetservice = discovery.build('sheets', 'v4', credentials=credentials)

    def __get_credentials(self, appname=""):
        credential_dir = Path.home().joinpath('.credentials')
        if not credential_dir.exists(): credential_dir.mkdir()

        store = Storage(credential_dir.joinpath(f'{appname}.json'))
        credentials = store.get()
        if not credentials or credentials.invalid:
            cs = [p for p in Path.cwd().glob("**/client*.json")][0]
            flow = client.flow_from_clientsecrets(cs, SCOPES)
            flow.user_agent = appname
            # NB this fails with Safari, set Chrome as default for this to work
            credentials = tools.run_flow(flow, store, None)
        return credentials

    def findItem(self, name="", mime="application/vnd.google-apps.folder") -> str:
        name = name.replace("'", "\\'") if "'" in name else name
        id = []
        page_token = None
        while True:
            response = self.driveservice.files().list(
                q=f"name='{name}' and trashed=false and mimeType='{mime}'",
                spaces='drive',
                fields='nextPageToken, files(id, name, mimeType)',
                pageToken=page_token).execute()
            for file in response.get('files', []):
                id.append(file["id"])
            page_token = response.get('nextPageToken', None)
            if page_token is None:
                break

        if id == []:  # not found
            id.append("")

        return id[0]

    def sheetdf(self, id, range="Sheet1!A1:Z"):
        df = pd.DataFrame(self.sheetservice.spreadsheets().values().get(
            spreadsheetId=id, range=range
        ).execute()["values"])
        df.columns = df.iloc[0]
        df.drop(0, inplace=True)
        return df


if __name__ == '__main__':
    mydrive = drive()

    df = mydrive.sheetdf(mydrive.findItem(
        "Blood pressure",mime="application/vnd.google-apps.spreadsheet"),  range="Sheet1!A1:D" )
    print(df.tail(5))


Reference:

Sign up to request clarification or add additional context in comments.

2 Comments

Awesome Answer, thanks! I will use that way next, it's a much cleaner way than my own short one. Does this pave the way for editing a sheet from python, too?
yes it does... sheetservice.spreadsheets() has bulk update methods too
1
import gspread
import pandas as pd


gc = gspread.service_account(filename='credentials.json')
book = gc.open_by_url(url)

You now have an authenticated connection to the google sheet you are trying to get a df from.

Each specific sheet in the spreadsheet can be accessed through the worksheet function.

sheet = book.worksheet('name of the sheet')

You can create a list of dictionaries from the data with a sheet by using this function

records = sheet.get_all_records()

And now a dataframe

df = pd.DataFrame(records)

There's a few steps to getting yourself a credentials file capable of authenticating. This video goes through them pretty clearly.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.