1

I want to get a csv file from my list. This is my list:

temp = ['سلام' , 'چطوری' ] 

Members of list are in Persian language. I tried to get csv file by this code:

import csv    
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(temp)

but terminal gives me this error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u06a9' in position 0: ordinal not in range(128)

How can I solve it and get my csv file?

P.S Actually when I print temp , I see these strings:

[u'\u06a9\u0627\u062e \u0645\u0648\u0632\u0647 \u06af\u0644\u0633\u062a\u0627\u0646 | Golestan Palace', u'\u062a\u0647\u0631\u0627\u0646', u'\u062a\u0647\u0631\u0627\]

But when I call Temp[1] I get this:

کاخ موزه گلستان | Golestan Palace

How can I solve it and get my csv file?

Why sometimes python encodes my data and sometime it doesn't?

2
  • @AvinashRaj I try that code but i got that error again!? Commented Jun 14, 2015 at 15:35
  • It tries to open as ASCII but you have UTF-16 (graphemica.com/%DA%A9). Specify the appropriate encoding when you open the file and try again. Commented Jun 14, 2015 at 15:57

2 Answers 2

2

In another answer, you said you were using Python 2.7. Extract from Python Standard Library Reference Manual - csv module :

The csv module doesn’t directly support reading and writing Unicode, but it is 8-bit-clean save for some problems with ASCII NUL characters. So you can write functions or classes that handle the encoding and decoding for you as long as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended.

Same paragraph gives you an example of a class that could be used to deal with unicode data :

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

But you could also try simpler code :

import csv    
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows([u.encode('utf-8') for u in temp])

if temp is a list of unicode strings

or :

import csv    
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows([[u.encode('utf-8') for u in row] for row in temp])

if temp is a list of list of unicode strings

Sign up to request clarification or add additional context in comments.

6 Comments

When I used your first simple code i got this error: writer.writerows([u.encode('utf-8') for u in temp]) AttributeError: 'list' object has no attribute 'encode' But when I tried your second simple code, the csv file was created but it separates every character in the temp list: for example Golestan or گلستان became , ,گ,ل,س,ت,ا,ن, ,|, ,G,o,l,e,s,t,a,n, ,
@Mehdi : you should say exactly what is temp if you want me to test code against actual values.
Temp list is more than 600 char ,I created temp by adding some data that I parsed from a html page, the next comment is what i get when i print temp, if there is better way to show you what is temp tell me to show you.
[u'\u06a9\u0627\u062e \u0645\u0648\u0632\u0647 \u06af\u0644\u0633\u062a\u0627\u0646 | Golestan Palace', u'\u062a\u0647\u0631\u0627\u0646', u'\u062a\u0647\u0631\u0627\u0646', u'\u0645\u06cc\u062f\u0627\u0646 \u06f1\u06f5 \u062e\u0631\u062f\u0627\u062f', .....
but when i print temp[0] i see کاخ موزه گلستان: | Golestan Palace temp[1] =تهران temp [5] = صفحه اصلی مکان‌ها گردشگری میراث فرهنگی کاخ موزه گلستان and ....
|
0

The csv library in Python 2 cannot handle Unicode data. This is fixed in Python 3, but will not be backported. However, there is a drop-in replacement 3rd party library that fixes the problem.

Try using UnicodeCSV instead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.