I am receiving a zip file in an s3 bucket. On its put event I have a aws lambda triggered. My lambda is supposed to unzip the file and upload the files inside it to another s3 bucket.
But these files can be a mix of ANSI and UTF-8 files.
I have to convert all of these to UTF-8. Any idea on how I can do it?
def get_utf_encoded_file(
file,
file_name: str
):
is_ansi = False
try:
file.read().decode('utf-8')
except:
try:
file.read().decode('cp1252') << I tried to print here, gives empty string
is_ansi = True
except Exception as e:
log.error(f"Unable to parse file {file_name}")
raise Exception(f"Unable to parse file {file_name}")
if is_ansi:
byte_stream = None
temp_file_name = "/tmp/" + str(uuid.uuid4()) + ".txt"
with codecs.open(temp_file_name, "w", encoding='UTF-8') as temp_file:
temp_file.write(file.read().decode('cp1252'))
with open(temp_file_name, "rb") as temp_file:
byte_stream = temp_file.read() << I tried print here gives empty byte array
print(byte_stream)
os.remove(temp_file_name)
return byte_stream
else:
return file
The function that's calling it:
def unzip_to_temp(
zip: ZipFile
):
for file_name in zip.namelist():
file_data = get_utf_encoded_file(file_name, zip.open(file_name))
upload_to_s3(file_data)
But the ansi files are created as empty files in s3.