0

I am trying to write a CSV file to my S3 bucket from inside a Lambda function. Everything is fine, except I cannot capture special characters; basically I need my file to be UTF-8 encoded. I do not want to use pandas or unicodecsv as those are not inbuilt to Lambda's environment.

Below is my current Lambda function:

import boto3
import csv
import io

def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    bucket = s3.Bucket("my-bucket-name-goes-here")
    fn = "sample_csv_lambda.csv"
    write_csv(fn, bucket)

def write_csv(target_filename, bucket):
    buff = io.StringIO()
    writer = csv.writer(buff, dialect="excel", delimiter=",")
    writer.writerow([f"header{i}" for i in range(1, 6)])
    writer.writerow([1, 2, 3, 4, 5])
    writer.writerow(["u", "b", "w", "d", "ş"])
    writer.writerow(["n", "p", "m", "q", "ğ"])
    buff2 = io.BytesIO(buff.getvalue().encode(encoding="UTF-8"))
    print(buff2.getvalue().decode("utf-8"))
    bucket.upload_fileobj(buff2, target_filename)

The print value on the second-to-last line outputs the special characters as intended, however once I download and open the CSV file, the characters in it are still not UTF-8.

PS: I like the current formulation of my code as I do not need to temporarily save the file in a "/tmp" folder as suggested by some other questions/answers. I also do not need to package and upload pandas/unicodecsv to my Lambda environment; too complicated for a beginner like me. Please keep this in mind when you answer.

16
  • Python3 strings are UTF8 already. What does are still not UTF-8. mean? Is the text mangled? Did you expect non-English characters to somehow change? This page is UTF8, the code you posted is UTF8, "ğ" is a UTF8 string with a single character Commented Jan 13, 2021 at 18:22
  • The csv file that is created and downloaded has 5 columns and 5 rows. The values ofş and ğ are appearing as ÅŸ and ÄŸ. I believe the resulting csv is not UTF-8 Commented Jan 13, 2021 at 18:26
  • No, that's exactly what UTF8 looks like when you open it as if it was Latin1. What OS are. you using? What. locale. settings?Which program do you use to read the file? Commented Jan 13, 2021 at 18:26
  • 1
    Put the relevant information in the question, not comments.I bet you double clicked on the file instead of importing it too. When you do that, Excel imports the file's contents using defaults. The default for non-UTF16 text is to use the user's locale. saving CSVs as "utf-8-sig" but here you used utf-8 which doesn't emit the BOM that would tell Excel this is a UTF8 file Commented Jan 13, 2021 at 18:32
  • 2
    This has nothing to do with Lambda or even Python - except for using utf-8-sig instead of utf-8. If you want to create Excel files, you can use a library like opepyxl to create real Excel files. What you do now is force Excel to import a text file using defaults. If you used the Data > Import menu you'd be able to specify the encoding. Right now Excel has to guess Commented Jan 13, 2021 at 18:32

1 Answer 1

3

Short Answer

The file is already UTF8, without a BOM. To emit a BOM, use utf-8-sig instead of utf-8 when encoding.

Long Answer

From the comments, it looks like you're trying to open a CSV file in Excel by double-clicking on the file. When you do that Excel will import the file contents using default settings. If a BOM is present, Excel will load the file using the encoding specified by the BOM. Without it there's no way to guess what encoding was used, so Excel will use the user's locale settings to import the data.

If you used the Data menu to import the data, Excel would show you a preview of the file and allow you to modify settings like the encoding, delimiters etc.

If you want to use that file with Excel, it would be a good idea to use a library like openpyxl to create a real xlsx file. An xlsx is a ZIP package containing well-define XML files. An xlsx file is typically a lot smaller than the equivalent CSV file and has no localization issues with numbers and dates.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.