I am using the Google Cloud Storage package for Python to copy a bunch of files from one bucket to another. Basic code is:
from google.cloud.storage.client import Client
def copy_bucket_content (client:Client, source_bucket_name, destination_bucket_name, source_dir):
source_bucket = client.get_bucket(source_bucket_name)
destination_bucket = client.get_bucket(destination_bucket_name)
blobs_to_copy = [blob for blob in source_bucket.list_blobs() if blob.name.startswith(source_dir)]
source_bucket.
for blob in blobs_to_copy:
print ("copying {blob}".format(blob=blob.name))
source_bucket.copy_blob(blob, destination_bucket, blob.name)
When I pass a source_dir that has many blobs in it the script fails at runtime with:
File "/Users/jamiet/.virtualenvs/hive-data-copy-biEl4iRK/lib/python3.6/site-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.InternalServerError: 500 POST https://www.googleapis.com/storage/v1/b/path/to/blob/copyTo/b/path/to/blob: Backend Error
This invariably occurs after transferring between 50 and 80 blobs (it doesn't fail at the same point each time).
I am assuming that I'm hitting some sort of API request limit. Would that be the case?
If so, how do I get around this? I suppose lifting the restriction would be one way but better would be able to issue just one call to the REST API rather than looping over all the blobs and copying them one at a time. I searched around the GCS python package but didn't find anything that might help.
I assume there's a better way of accomplishing this but I don't know what it is, can anyone help?