0

I would like το create a function to read files from a shared Google Drive folder and concatanate them into one df. I would prefer to do it without using any authenticators if it would be possible.

I used this code i found here :

url = 'https://drive.google.com/drive/folders/1_Hwl0tm6rfechhTF-YGVsOA0cbzu4jUd'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
df = pd.read_csv(path)

I want to read all files in the folder using glob and concatanate them in one df but i get the HTTPError: HTTP Error 404: Not Found , error. Any help would be apreciated

2 Answers 2

1

You cannot download a folder directly, the folders within the Drive API are considered as files, with the difference of the MIME type application/vnd.google-apps.folder

As the Drive API documentation says:

A container you can use to organize other types of files on Drive. Folders are files that only contain metadata, and have the MIME type application/vnd.google-apps.folder.

Note: A single file stored on My Drive can be contained in multiple folders. A single file stored on a shared drive can only have one parent folder.

As a workaround, you can list all the files contained within a folder and download them one by one. To build the following example I have based on this:

do.py
def list_and_download():
    service = drive_service()
    folder_id = FOLDER_ID
    # List all files within the folder
    results = service.files().list(q="'{}' in parents".format(folder_id), includeItemsFromAllDrives= true, supportsAllDrive=true).execute()
    items = results.get("files", [])
    print(items)
    fh = io.BytesIO()
    for item in items:
        # download file one by one using MediaIoBaseDownload
        if item["mimeType"] != "text/csv":
            return
        request = service.files().get_media(fileId=item["id"])
        fh = io.BytesIO()
        downloader = MediaIoBaseDownload(fh, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print("Download {}%.".format(int(status.progress() * 100)))
        print("Download Complete!")
        with open(item["name"], "wb") as f:
            f.write(fh.read())

    # Do whatever you want with the csv
Documentation
Documentation
Sign up to request clarification or add additional context in comments.

Comments

1

You should use Google-API to list your files in shared folder. https://developers.google.com/drive/api/v2/reference/children/list

Example usage of API to list files https://i.ibb.co/pyx8mKG/drive-list.png

After than if you get children list from json file you can read and concat dataframe



import pandas as pd

response = {
 "kind": "drive#childList",
 "etag": "\"9NuiSicPg_3yRScMQO3pipPxwvs\"",
 "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children",
 "items": [
  {
   "kind": "drive#childReference",
   "id": "1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
   "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
   "childLink": "https://www.googleapis.com/drive/v2/files/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1"
  },
  {
   "kind": "drive#childReference",
   "id": "14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
   "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
   "childLink": "https://www.googleapis.com/drive/v2/files/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO"
  }
 ]
}

item_arr = []
for item in response["items"]:
    print(item["id"])
    download_url = 'https://drive.google.com/uc?id=' + item["id"]
    item_arr.append(pd.read_csv(download_url))
df = pd.concat(item_arr, axis=0)
print(df.head())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.