0

We have a project accessing Azure blob storage with code that can be simplified like:

private static BlobContainerClient _chunk_storage_client;

foreach (Chunk chunk in targetFile.Chunks)
{
    BlobClient chunkClient = _chunk_storage_client.GetBlobClient(chunk.Name);
    BlobProperties props = chunkClient.GetProperties();

    if (props.ContentHash.Length > 0)
    {
        chunk.Md5 = props.ContentHash.ToHex();
    }
}

The code works for most of the case but for some particular targetFile that has 100K chunks. The code will lead 100K calls to storage that result in timeout.

I wonder if there is some approach to input a param with all chunk.Name as a collection and get all properties of all blobs in one call?

1
  • Replace 100K individual GetProperties() calls with a single GetBlobsAsync(BlobTraits.Properties) listing call and map the results to your chunk names. Commented Jun 16 at 1:37

1 Answer 1

0

Azure blob storage to get multiple properties from multiple blobs by one call

Azure Blob Storage does not support a direct operation to get properties for multiple, individually specified blobs in a single API call neither through the REST API nor through the Azure SDKs.

Fetching the blob properties in parallel using Parallel.ForEachAsync, with a limit on how many run at once. This makes the process much faster than doing them one by one

namespace BlobPropertiesFetcher
{
    public class Chunk
    {
        public string Name { get; set; }
        public string Md5 { get; set; }
    }
    internal class Program
    {
        private static async Task Main(string[] args)
        {
            string connectionString = "<YourConnectionString>";
            string containerName = "<your ContainerName>";
            BlobContainerClient containerClient = new BlobContainerClient(connectionString, containerName);
            List<Chunk> targetChunks = Enumerable.Range(1, 100000).Select(i => new Chunk { Name = $"chunk-{i}" }).ToList();         
            var options = new ParallelOptions { MaxDegreeOfParallelism = 20 };
            await Parallel.ForEachAsync(targetChunks, options, async (chunk, token) =>
            {
                try
                {
                    BlobClient blobClient = containerClient.GetBlobClient(chunk.Name);
                    BlobProperties props = await blobClient.GetPropertiesAsync();
                    if (props.ContentHash != null && props.ContentHash.Length > 0)
                    {
                        chunk.Md5 = BitConverter.ToString(props.ContentHash).Replace("-", "").ToLowerInvariant();
                    }
                }
                catch (RequestFailedException ex)
                {
                    Console.WriteLine($"Error fetching {chunk.Name}: {ex.Message}");
                }
            });

            foreach (var chunk in targetChunks.Take(5))
            {
                Console.WriteLine($"{chunk.Name} => MD5: {chunk.Md5}");
            }
            Console.WriteLine("Done fetching properties.");
        }
    }
}

Output:

chunk1.txt => MD5: 91e632f380ff76bf81edb307409a1066
chunk2.txt => MD5: 8b1a995c4c4611296a827abf8c47804d7
chunk3.txt => MD5: 91e632f380ff76bf81edb307409a1066
chunk4.txt => MD5: 91e632f380ff76bf81edb307409a1066
chunk5.txt => MD5: 35a9aac0c8df0c2ba38f4b1ba26fc8b77
Done fetching properties.

C:\Use.....\Debug\net8.0\ConsoleApp1.exe (process 24476) exited with code 0 (0x0).
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .

Image

References:

List Blobs

BlobContainerClient.GetBlobsAsync Method

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.