0

My Goal is to read a .csv file from google drive and load it to a dataframe.

I tried some answers here but the thing is, the file is not public and needs authentication.

I looked up on goggle drive API but I was stuck there and I don't know how to move forward. I did manage to open google sheet and load it to a dataframe but that is different, this is a sample for google sheet that works.

service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
sheets_file = sheet.values().get(
                     spreadsheetId=sheet_id,
                     range=sheet_range
                     ).execute()
    
header = sheets_file.get('values', [])[0]   # Assumes first line is header!
values = sheets_file.get('values', [])[1:]  # Everything else is data.
  
if not values:
    print('No data found.')
else:
    all_data = []
    for col_id, col_name in enumerate(header):
        column_data = []
        for row in values:
            column_data.append(row[col_id])
        ds = pd.Series(data=column_data, name=col_name)
        all_data.append(ds)
        df = pd.concat(all_data, axis=1)
        print(df.head())

I saw some google colab methods too but I cant use that as I am restricted to using python only, any Idea on how to approach this?

5
  • 1
    If it requires authentication to get to the file, then you may have to use Selenium to open the webpage, login, and download. Commented Sep 17, 2020 at 23:36
  • I prefer to use google API as I have to change the flow if I use selenium Commented Sep 17, 2020 at 23:53
  • Can I ask you about your question? 1. In your script, can you use service of service = build('sheets', 'v4', credentials=creds) for sheet.values().get(###)? I would like to ask whether you have already done the authorization of the scopes. 2. About doing other format like csv/.xlsx, which do you want to convert from Google Spreadsheet to? Commented Sep 18, 2020 at 0:08
  • yes I already have completed authentication, I did not include the boilerplate code. And no I dont want to convert it to google sheet. I only want to get the csv file and convert it to a dataframe Commented Sep 18, 2020 at 0:22
  • Thank you for replying. From your replying, I proposed 3 sample scripts as an answer. Could you please confirm it? If those were not the direction you expect, I apologize. Commented Sep 18, 2020 at 1:25

1 Answer 1

3

I believe your goal and situation as follows.

  • You want to download the CSV data from the CSV file on Google Drive.
  • You can get values from Google Spreadsheet using googleapis for python.

Pattern 1:

In this pattern, the CSV data is downloaded with googleapis. The downloaded CSV data is saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.

Sample script:

file_id = "###"  # Please set the file ID of the CSV file.

service = build('drive', 'v3', credentials=creds)
request = service.files().get_media(fileId=file_id)
fh = io.FileIO("sample.csv", mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
  • In this case, the CSV data can be converted to the dataframe with df = pd.read_csv("sample.csv").

Pattern 2:

In this pattern, as a simple method, the access token is used from creds. The downloaded CSV data is not saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.

Sample script:

file_id = "###"  # Please set the file ID of the CSV file.

access_token = creds.token
url = "https://www.googleapis.com/drive/v3/files/" + file_id + "?alt=media"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res.text)
  • In this case, the CSV data can be directly converted to the dataframe with df = pd.read_csv(io.StringIO(res.text)).

Note:

  • In the following scripts, please include the scope of https://www.googleapis.com/auth/drive.readonly and/or https://www.googleapis.com/auth/drive. When you modified the scopes, please reauthorize the scopes. By this, the modified scopes are included in the access token. Please be careful this.

Reference:

Sign up to request clarification or add additional context in comments.

9 Comments

uhmm no, i actually want to open csv file on the drive. Sorry bout that, Will update the question for clarity
@Led Thank you for replying. In your situation, a CSV file is on Google Drive, and you want to retrieve the CSV data from the CSV file. Is my understanding correct?
yep thats the goal. From google drive, Load the csv contents into a dataframe.
@Led Thank you for replying. I could correctly understand about your goal. I apologize for my poor English skill. From your script, I had thought that you wanted to retrieve the CSV data from Google Spreadsheet. From your replying, I would like to update my answer. Could you please wait for it?
sure thing, i don't have any choice :D
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.