export a list as a csv file in python and getting UnicodeEncodeError

Question

I want to get a csv file from my list. This is my list:

temp = ['سلام' , 'چطوری' ]

Members of list are in Persian language. I tried to get csv file by this code:

import csv    
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(temp)

but terminal gives me this error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u06a9' in position 0: ordinal not in range(128)

How can I solve it and get my csv file?

P.S Actually when I print temp , I see these strings:

[u'\u06a9\u0627\u062e \u0645\u0648\u0632\u0647 \u06af\u0644\u0633\u062a\u0627\u0646 | Golestan Palace', u'\u062a\u0647\u0631\u0627\u0646', u'\u062a\u0647\u0631\u0627\]

But when I call Temp[1] I get this:

کاخ موزه گلستان | Golestan Palace

How can I solve it and get my csv file?

Why sometimes python encodes my data and sometime it doesn't?

It tries to open as ASCII but you have UTF-16 (graphemica.com/%DA%A9). Specify the appropriate encoding when you open the file and try again. — rbaleksandar
– rbaleksandar, Commented Jun 14, 2015 at 15:57

Serge Ballesta · Accepted Answer · 2015-06-14 18:04:59Z

2

In another answer, you said you were using Python 2.7. Extract from Python Standard Library Reference Manual - csv module :

The csv module doesn’t directly support reading and writing Unicode, but it is 8-bit-clean save for some problems with ASCII NUL characters. So you can write functions or classes that handle the encoding and decoding for you as long as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended.

Same paragraph gives you an example of a class that could be used to deal with unicode data :

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

But you could also try simpler code :

import csv    
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows([u.encode('utf-8') for u in temp])

if temp is a list of unicode strings

or :

import csv    
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows([[u.encode('utf-8') for u in row] for row in temp])

if temp is a list of list of unicode strings

answered Jun 14, 2015 at 18:04

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Mehdi Over a year ago

When I used your first simple code i got this error: writer.writerows([u.encode('utf-8') for u in temp]) AttributeError: 'list' object has no attribute 'encode' But when I tried your second simple code, the csv file was created but it separates every character in the temp list: for example Golestan or گلستان became , ,گ,ل,س,ت,ا,ن, ,|, ,G,o,l,e,s,t,a,n, ,

Serge Ballesta Over a year ago

@Mehdi : you should say exactly what is temp if you want me to test code against actual values.

Mehdi Over a year ago

Temp list is more than 600 char ,I created temp by adding some data that I parsed from a html page, the next comment is what i get when i print temp, if there is better way to show you what is temp tell me to show you.

Mehdi Over a year ago

[u'\u06a9\u0627\u062e \u0645\u0648\u0632\u0647 \u06af\u0644\u0633\u062a\u0627\u0646 | Golestan Palace', u'\u062a\u0647\u0631\u0627\u0646', u'\u062a\u0647\u0631\u0627\u0646', u'\u0645\u06cc\u062f\u0627\u0646 \u06f1\u06f5 \u062e\u0631\u062f\u0627\u062f', .....

Mehdi Over a year ago

but when i print temp[0] i see کاخ موزه گلستان: | Golestan Palace temp[1] =تهران temp [5] = صفحه اصلی مکان‌ها گردشگری میراث فرهنگی کاخ موزه گلستان and ....

|

Michael C. O'Connor · Accepted Answer · 2015-06-14 16:56:06Z

0

The csv library in Python 2 cannot handle Unicode data. This is fixed in Python 3, but will not be backported. However, there is a drop-in replacement 3rd party library that fixes the problem.

Try using UnicodeCSV instead.

answered Jun 14, 2015 at 16:56

Michael C. O'Connor

9,9003 gold badges40 silver badges49 bronze badges

Collectives™ on Stack Overflow

export a list as a csv file in python and getting UnicodeEncodeError

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related