1

I would like to export data from a csv file which contains unicode strings.

Previously I tried a Python script which works fine for ASCII data only. But it won't support unicode stuff either:

#! /usr/bin/env python
import csv
csv.register_dialect('custom',delimiter=','
                     doublequote=True,
                     escapechar=None,
                     quotechar='"',
                     quoting=csv.QUOTE_MINIMAL, skipinitialspace=False)
with open('input.csv') as ifile:
 data = csv.reader(ifile, dialect='custom')
 for record in data:
  for i, field in enumerate(record):
   print (" <field%s>" % i + field + "</field%s>" % i)

Traceback (most recent call last): for record in data: _csv.Error: line contains NULL byte

4
  • post your code and error message / traceback Commented May 15, 2013 at 5:58
  • #! /usr/bin/env python import csv csv.register_dialect('custom',delimiter=',', doublequote=True,escapechar=None, quotechar='"',quoting=csv.QUOTE_MINIMAL, skipinitialspace=False) with open('input.csv') as ifile: data = csv.reader(ifile, dialect='custom') for record in data: for i, field in enumerate(record): print (" <field%s>" % i + field + "</field%s>" % i) Traceback (most recent call last): for record in data: _csv.Error: line contains NULL byte Commented May 15, 2013 at 6:13
  • There's no such thing as a Unicode file. Files are always in some encoding: this one is probably utf-8. Commented May 15, 2013 at 7:22
  • If the question is about Python 3; add python-3.x tag to the question. Commented May 15, 2013 at 8:14

3 Answers 3

2

use this unicode-csv library instead

https://github.com/jdunck/python-unicodecsv

import unicodecsv as csv

with open('input.csv') as ifile:
  rows = [row for row in csv.reader(ifile, encoding='utf-8')]

print rows
Sign up to request clarification or add additional context in comments.

5 Comments

I had never used Python , I download the package as specified by you.Can you please tell me how I can use the same in to my python script or what changes I have to do for the same to work upon Unicode based csv file.
there is problem in the file named init.py line no 49 except TypeError, e: replace it with except e: am I correct or not? After this I am getting erro for import unicodecsv as csv File "C:\Program Files (x86)\Python30\unicodecsv_init_.py", line 3, in <mod ule> from itertools import izip ImportError: cannot import name izip
I removed the import statment(from itertools import izip) from the file named init.py as python 3 wont need it anymore... I am getting error as follows:- record = list(csv.reader(ifile, encoding='utf-8')) TypeError: iter() returned non-iterator of type 'UnicodeReader'
Can anybody help me in using the Unicodecsv-csv library...... I am getting the following errors:- Traceback (most recent call last): File line 6, in <module> rows = list(csv.reader(ifile, encoding='utf-8')) TypeError: iter() returned non-iterator of type 'UnicodeReader'
modified ... I thought you could just call list on it ... but it looks like you need to manually iterate it
1

You can wrap the csv.reader in a class to handle it for you. The following is taken from the csv documentation examples and works for me:

#! /usr/bin/env python
import csv, codecs

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

    def __iter__(self):
        return self

    def next(self):
        return self.reader.next().encode("utf-8")

class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]

    def __iter__(self):
        return self




csv.register_dialect('custom', delimiter=',',
                     doublequote=True,
                     escapechar=None,
                     quotechar='"',
                     quoting=csv.QUOTE_MINIMAL, skipinitialspace=False)

with open('input.csv') as ifile:
 data = UnicodeReader(ifile, dialect='custom')
 for record in data:
  for i, field in enumerate(record):
   print (" <field%s>" % i + field + "</field%s>" % i)

There is also a UnicodeWriter class there if you need that functionality.

3 Comments

tell me how I can use it in my code as you can see my code above... or send me complete sample code if possible....as I am beginner in Python
Is there anybody who can help me in soving my problem to export the data from a csv file which contain diffrent language strings inside ....
I've added a full code segment above, this works fine for me in Python 2.7.4 to read in files containing utf-8.
0

It seems you are using Python 3. Follow the very first code example in the docs:

#!/usr/bin/env python3
import csv

with open('input.csv', newline='', encoding=encoding) as csvfile:
    reader = csv.reader(csvfile, dialect="custom")
    for row in reader:
        print(", ".join(row))

where "custom" dialect is defined in the code in your question and encoding is the character encoding of your file such as "utf-16". If you omit encoding argument; the encoding returned by locale.getpreferredencoding(False) is used.

4 Comments

ImportError: cannot import name escape
html.escape() is available since Python 3.2 . You could use cgi.escape(field, quote=True) on earlier versions. Note: it is unrelated to your Unicode issue
I am facing error because of line from html import escape How I can resolve it?
I've removed html.escape (to deal with one issue at a time)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.