3

I have a Python app. In this context I want to retrieve the blob references from an Azure Storage container that match a certain prefix and then delete all the blobs in one go. I tried the following:

container_client: ContainerClient = ContainerClient.from_connection_string(conn_str=storage_account_connection_string, container_name=container_name)

blob_list: ItemPaged[BlobProperties] = container_client.list_blobs(name_starts_with=prefix)

container_client.delete_blobs(*blob_list, delete_snapshots="include")

This works fine as long as there are blobs that match the prefix. But if that is not the case I get an exception when trying to execute delete_blobs:

tuple index out of range

I don't want to work with try except and I also don't want to iterate first. I would like to have an indicator that tells me if there are blobs at all without the need to do extra calls.

How can I do that?

Thanks

EDIT: Based on what has been suggested by @Gaurav the following approach works:

from azure.storage.blob import ContainerClient, BlobProperties
from azure.core.paging import ItemPaged
from typing import List

blob_paged: ItemPaged[BlobProperties] = container_client.list_blobs(name_starts_with=prefix)
blob_list: List[dict] = list(blob_paged)
number_of_blobs: int = len(blob_list)

if number_of_blobs > 0:
    container_client.delete_blobs(*blob_list, delete_snapshots="include")
    log.debug(f"Deleted '{ number_of_blobs }' blobs and snapshots...")   
else:
    log.debug(f"No blobs to be deleted...")

Three things you should be aware about:

  • Using list() will resolve the iterator and load all blobs into memory
  • blob_paged can't be used anymore as argument for delete_blobs after being resolved
  • When using blob_list as argument for delete_blobs it will log a warning like Failed to parse headers... (Bug?). The blobs still get deleted.
0

1 Answer 1

4

delete_blobs method makes use of Blob Batch operation to delete multiple blobs in a single request. According to the documentation, maximum number of items in a batch can be 256 or the maximum payload size is 4MB (Ref: https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks).

I believe you're getting this error is because you're either sending more than 256 blobs in your delete_blobs method or the payload is more than 4MB in size.

UPDATE

You will also get the error if the items in the blobs_list are zero. You can use the following code to see the number of items (Ref: Getting number of elements in an iterator in Python):

number_of_blobs = len(list(blobs_list))
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for your response. This exception occurs (reproducible) when there are no blobs that match the prefix.
Oh, I forgot to mention that the service will also return an error if the batch size is zero.
Yes, but the question is how to determine that the batch size is zero respectively that there are no blobs to be retrieved? Is there no property that indicates that? Thanks
Can't you simply check the length or count of blob_list variable to determine that?
Oh! That’s surprising. Let me try it out. I’ll revert shortly.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.