0

I am using the Google Cloud Storage package for Python to copy a bunch of files from one bucket to another. Basic code is:

from google.cloud.storage.client import Client
def copy_bucket_content (client:Client, source_bucket_name, destination_bucket_name, source_dir):
    source_bucket = client.get_bucket(source_bucket_name)
    destination_bucket = client.get_bucket(destination_bucket_name)
    blobs_to_copy = [blob for blob in source_bucket.list_blobs() if blob.name.startswith(source_dir)]
    source_bucket.
    for blob in blobs_to_copy:
        print ("copying {blob}".format(blob=blob.name))
        source_bucket.copy_blob(blob, destination_bucket, blob.name)

When I pass a source_dir that has many blobs in it the script fails at runtime with:

File "/Users/jamiet/.virtualenvs/hive-data-copy-biEl4iRK/lib/python3.6/site-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.InternalServerError: 500 POST https://www.googleapis.com/storage/v1/b/path/to/blob/copyTo/b/path/to/blob: Backend Error

This invariably occurs after transferring between 50 and 80 blobs (it doesn't fail at the same point each time).

I am assuming that I'm hitting some sort of API request limit. Would that be the case?

If so, how do I get around this? I suppose lifting the restriction would be one way but better would be able to issue just one call to the REST API rather than looping over all the blobs and copying them one at a time. I searched around the GCS python package but didn't find anything that might help.

I assume there's a better way of accomplishing this but I don't know what it is, can anyone help?

1 Answer 1

1

There's no quota restriction regarding this scenario. Error 500 indicates a server side issue. You could use an exponential backoff strategy, according to the Handling errors documentation, as well follow the best practices for uploading data.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks, will give that a go. Any suggestions as to how this can be achieved with a single API call rather than multiple?
I used retrying (pip install retrying) which supports exponential backoff. Worked great. Thanks @F10

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.