I have some files in a S3 bucket and I'm trying to read them in the fastest possible way. The file's format is gzip and inside it, there is a single multi object json file like this:
{"id":"test1", "created":"2020-01-01", "lastUpdated":"2020-01-01T00:00:00.000Z"}
{"id":"test2", "created":"2020-01-01", "lastUpdated":"2020-01-01T00:00:00.000Z"}
What I want to do is load the json file and read every single object and process it. After some research this is the only code that it worked for me
import json
import gzip
import boto3
from io import BytesIO
s3 = boto3.resource('s3')
bucket = s3.Bucket("my-bucket")
for obj in bucket.objects.filter(Prefix='my-prefix').all():
buffer = BytesIO(obj.get()['Body'].read())
gzipfile = gzip.GzipFile(fileobj=buffer)
for line in gzipfile:
json_object = json.loads(line)
# some stuff with the json_object
Anyone knows a better way to read the json object?
Thanks for helping