Can I compress data from Azure Blob to gzip as I download it? I would like to avoid having all data in memory if possible.
I tried two different approaches (compress_chunk and compress_blob) functions. I am not sure if the entire blob was in memory though before compression, or if I can compress it as it is read in somehow.
def compress_chunk(data):
data.seek(0)
compressed_body = io.BytesIO()
compressor = gzip.open(compressed_body, mode='wb')
while True:
chunk = data.read(1024 * 1024 * 4)
if not chunk:
break
compressor.write(chunk)
compressor.flush()
compressor.close()
compressed_body.seek(0, 0)
return compressed_body
def compress_blob(data):
compressed_body = gzip.compress(data.getvalue())
return compressed_body
def process_download(container_name, blob):
with io.BytesIO() as input_io:
blob_service.get_blob_to_stream(container_name=container_name, blob_name=blob.name, stream=input_io)
compressed_body = compress_chunk(data=input_io)
smart_openthat helped me solve my problem. Definitely worth a shot to anyone who sees this comment. Using smart_open with gzip to compress blobs.